Skip to content

Musings of an Anonymous Geek

Made with only the finest 1's and 0's

Menu
  • About
  • Search Results
Menu

Awk Idioms: Shorten your pipelines, consolidate your tool set

Posted on July 27, 2009 by bkjones

I was lurking around on twitter during my lunch hour (yes, even freelancers need a lunch hour), and @bitprophet tweeted thusly:

Get syslog-owned log names from syslog.conf: grep -v “^#” syslog.conf | awk ‘{print $2}’ | egrep -v “^(\*|\|)” | sed “/^$/ d” | sed “s/^-//”

Followed by this:

(Interested to see if anyone can shorten my previous tweet’s command line, outside of using ‘cut’ instead of the awk bit.)

I happen to love puzzles like this, and my lunch was almost immediately followed by a long, boring conference call.

@bitprophet’s pipeline above is translated by my brain into the English:

Find non-commented lines, grab the second space-delimited field, then filter out the ones that start with “*” or “|”, then delete any blank lines, and strip any leading “-” from the result.

My brain usually attempts to think of the English version of the solution *first*, and then try to emulate that in the code/command I write. So, the issue here is we want to find file paths (and apparently sockets are ok, too, as “@” is a valid leading character in the initial definition of the problem). If it’s a file path, we want to see it in a form that would be suitable for passing it to something like “ls -l”, which means leading symbols like “-” and “|” should be omitted.

In a syslog.conf file, the main meat is the area where you specify the warning levels, and the file you want messages at that warning level sent to (this is a simplistic explanation, but good enough to understand the solution I came up with). The file is also littered with comments. Here’s the file on my Mac:

*.err;kern.*;auth.notice;authpriv,remoteauth,install.none;mail.crit        /dev/console
*.notice;authpriv,remoteauth,ftp,install.none;kern.debug;mail.crit    /var/log/system.log

# Send messages normally sent to the console also to the serial port.
# To stop messages from being sent out the serial port, comment out this line.
#*.err;kern.*;auth.notice;authpriv,remoteauth.none;mail.crit        /dev/tty.serial

# The authpriv log file should be restricted access; these
# messages shouldn't go to terminals or publically-readable
# files.
auth.info;authpriv.*;remoteauth.crit            /var/log/secure.log

lpr.info                        /var/log/lpr.log
mail.*                            /var/log/mail.log
ftp.*                            /var/log/ftp.log

install.*                        /var/log/install.log
install.*                        @127.0.0.1:32376
local0.*                        /var/log/appfirewall.log
local1.*                        /var/log/ipfw.log
stuff.*                            -/boo
things.*                        |/var/log
*.emerg                            *

So, in English, my brain parses the problem like this:

Skip blank lines, commented lines, and lines where the file name is “*”, and give me everything else, but strip off characters “-” and “|” before sending it to the screen.

And here’s my awk one-liner for doing that:

awk '$0 !~ /^$|^#/ && $2 !~ /^\*/ {sub(/^-|^\|/,"",$2);print $2}' syslog.conf

Knowing a few key things about awk will help parse the above:

Awk automatically breaks up each line of input into fields. If you don’t tell it what to use as a delimiter, it’ll just use any number of spaces as the delimiter. If you have a CSV file, you’d likely use “awk -F,” to tell awk to use a comma. For /etc/passwd, use “awk -F:”. From there, you can reference the first field as $1, the second as $2, etc. $0 represents the whole line. There are more, but that’s enough for this example.

Though I think most sysadmins can get a lot done with simple usage like “awk -F: ‘{print $2}'”, sometimes more power is needed, and awk delivers. It uses the basic regex engine, and enables you to check a field (or the whole line: $0, like I do above) against a regex as a precondition for performing some action with the line or a field on that line. So, in the above awk command, I check to see if the line is either empty, or a comment. I then use a logical AND to check if field 2 starts with “*”. If the current line is a match for any of these rules it is skipped.

Another nice thing about awk is that it actually is a Turing-complete programming language. After I check the lines of input against the rules mentioned above, I immediately know that I definitely want at least some portion of $2 in the remaining lines. What I *don’t* want are preceding characters like “-” or “|”. I need to strip them from the file name. I use awk’s built in “sub()” function to handle that, and with that out of the way I call “print” to send the result to the screen.


Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Facebook (Opens in new window) Facebook

Recent Posts

  • Auditing Your Data Migration To ClickHouse Using ClickHouse Local
  • ClickHouse Cheat Sheet 2024
  • User Activation With Django and Djoser
  • Python Selenium Webdriver Notes
  • On Keeping A Journal and Journaling
  • What Geeks Could Learn From Working In Restaurants
  • What I’ve Been Up To
  • PyCon Talk Proposals: All You Need to Know And More
  • Sending Alerts With Graphite Graphs From Nagios
  • The Python User Group in Princeton (PUG-IP): 6 months in

Categories

  • Apple
  • Big Ideas
  • Books
  • CodeKata
  • Database
  • Django
  • Freelancing
  • Hacks
  • journaling
  • Leadership
  • Linux
  • LinuxLaboratory
  • Loghetti
  • Me stuff
  • Other Cool Blogs
  • PHP
  • Productivity
  • Python
  • PyTPMOTW
  • Ruby
  • Scripting
  • Sysadmin
  • Technology
  • Testing
  • Uncategorized
  • Web Services
  • Woodworking

Archives

  • January 2024
  • May 2021
  • December 2020
  • January 2014
  • September 2012
  • August 2012
  • February 2012
  • November 2011
  • October 2011
  • June 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • September 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
© 2025 Musings of an Anonymous Geek | Powered by Minimalist Blog WordPress Theme