Archive for the ‘Linux’ Category

Quick Loghetti Update

Monday, March 15th, 2010

For the familiar and impatient: Loghetti has moved to github and has been updated. An official release hasn’t been made yet, but cloning the repository and installing argparse will result in perfectly usable code. More on the way.

For the uninitiated, Loghetti is a command line log sifting/reporting tool written in Python to parse Apache Combined Format log files. It was initially released in late 2008 on Google Code. I used loghetti for my own work, which involved sifting log files with tens of millions of lines. Needless to say, it needed to be reasonably fast, and give me a decent amount of control over the data returned. It also had to be easy to use; just because it’s fast doesn’t mean I want to retype my command because of confusing options or the like.

So, loghetti is reasonably fast, and reasonably easy, and gives a reasonable amount of control to the end user. It’s certainly a heckuva lot easier than writing regular expressions into ‘grep’ and doing the ol’ ‘press & pray’.

Loghetti suffered a bit over the last several months because one of its dependencies broke backward compatibility with earlier releases. Such is the nature of development. Last night I finally got to crack open the code for loghetti again, and was able to put a solution together in an hour or so, which surprised me.

I was able to completely replace Doug Hellmann’s CommandLineApp with argparse very, very quickly. Of course, CommandLineApp was taking on responsibility for actually running the app itself (the main loghetti class was a subclass of CommandLineApp), and was dealing with the options, error handling, and all that jazz. It’s also wonderfully generic, and is written so that pretty much any app, regardless of the type of options it takes, could run as a CommandLineApp.

argparse was not a fast friend of mine. I stumbled a little over whether I should just update the namespace of my main class via argparse, or if I should pass in the Namespace object, or… something else. Eventually, I got what I needed, and not much more.

So loghetti now requires argparse, which is not part of the standard library, so why replace what I knew with some other (foreign) library? Because argparse is, as I understand it, slated for inclusion in Python 3, at which point optparse will be deprecated.

So, head on over to the GitHub repo, give it a spin, and send your pull requests and patches. Let the games begin!

CodeKata 4: Data Munging

Monday, December 28th, 2009

I’m continuing to take on the items in Dave Thomas’s Code Kata collection. It’s a nice way to spend a Sunday night, and it’s a good way to get my brain going again before work on Monday morning. It’s also fun and educational :)

CodeKata 4 is called “Data Munging”. It’s not very difficult data munging, really. I think the more interesting bit of the Kata is how to deal with files in different languages, what tools are well suited to the task, and trying to minimize code duplication.


Code Kata 4 instructs us to download weather.dat, a listing of weather information for each day of June, 2002 in Morristown, NJ. We’re to write a program that identifies the day which has the smallest gap between min and max temperatures.

Then, it says to download football.dat, a listing of season statistics for Soccer teams. We’re to write a program that identifies the team with the smallest gap between goals scored for the team, and goals scored against the team.

Once those are done, we’re asked to try to factor out as much duplicate code as possible between the two programs, and then we’re asked a few questions to help us think more deeply about what just transpired.

My Attack Plan

The first thing that jumped into my brain when I saw the data files was “Awk would be perfect for this”. I fiddled with that for a little too long (my awk is a little rusty), and came up with this (for weather.dat):

>$ awk 'BEGIN {min=100000}; $1 ~ /^[1-3]/ {x[$1]=$2-$3; if (x[$1]<min){ min=x[$1]; winner=$1}} END {print winner, min} ' weather.dat.dat
14 2

It works, and awk, though it gets ugly to some, reads in a nice, linear way to me. You give it a filter expression, and then statements to act on the matching lines (in braces). What could be simpler?

After proving to myself that I hadn’t completely lost my awk-fu, I went about writing a Python script to deal with the problem. I read ahead in the problem description, though, and so my script contains separate blocks for the two data sets in one script:

#!/usr/bin/env python
import sys
import string

data = open(sys.argv[1], 'r').readlines()

if 'weather' in sys.argv[1]:
   winner = 1000000
   winnerday = None

   for line in data:
      #filter out lines that aren't numbered.
      if line.strip().startswith(tuple(string.digits)):
         # we only need the first three fields to do our work
         l = line.split()[:3]
         # some temps have asterisks attached to them.
         maxt = l[1].strip(string.punctuation)
         mint = l[2].strip(string.punctuation)
         diff = int(maxt) - int(mint)
         if diff < winner:
            winner = diff
            winnerday = l[0]
   print "On day %s, the temp difference was only %d degrees!" % (winnerday, winner)

if 'football' in sys.argv[1]:
   winner = 1000000
   winnerteam = None

   for line in data:
      if line.strip().startswith(tuple(string.digits)):
         l = line.split()
         team, f, a = l[1], int(l[6]), int(l[8])
         diff = abs(f - a)
         if diff < winner:
            winner = diff
            winnerteam = team
   print "Team %s had a for/against gap of only %d points!" % (winnerteam, winner)

Really, the logic employed is not much different from the awk solution:

  1. Set a default for ‘winner’ that’s unlikely to be rivaled by the data :)
  2. Set the default for the winning team or day to ‘None’
  3. Filter out unwanted lines in the dataset.
  4. Grab bits of each line that are useful.
  5. Assign each useful bit to a variable.
  6. Do math.
  7. Do comparisons
  8. Present results.


Part 2 of the Kata says to factor out as much duplicate code as possible. I was able to factor out almost all of it on the first shot at refactoring, leaving only the line of code (per file) to identify the relevant columns of each data set:

#!/usr/bin/env python
import sys
import string

data = open(sys.argv[1], 'r').readlines()
winner_val = 1000000
winner_id = None

for line in data:
   if line.strip().startswith(tuple(string.digits)):
      l = line.split()
      if 'weather' in sys.argv[1]:
         identifier, minuend, subtrahend = l[0], int(l[1].strip(string.punctuation)), int(l[2].strip(string.punctuation))
      elif 'football' in sys.argv[1]:
         identifier, minuend, subtrahend = l[1], int(l[6]), int(l[8])
      diff = abs(minuend - subtrahend)
      if diff < winner_val:
         winner_val = diff
         winner_id = identifier

print winner_id, winner_val

Not too bad. I could’ve done some other things to make things work differently: for example I could’ve let the user feed in the column header names of the ‘identifier’, ‘minuend’ and ‘subtrahend’ columns in each data set, and then I could just *not* parse out the header line and instead use it to identify the list index positions of the bits I need for each line. It’d make the whole thing ‘just work’. It would also require more effort from the user. On the other hand, it would make things “just work” for just about any file with numbered lines of columnar data.

I have to admit that the minute I see columnar data like this, awk is the first thing I reach for, so I’m sure this affected my Python solution. The good news there is that my thinking toward columnar data is consistent, and so I treated both files pretty much the same way, making refactoring a 5-minute process.

In all, I enjoyed this Kata. Though I didn’t take it as far as I could have, it did make me think about how it could be improved and made more generic. Those improvements could incur a cost in terms of readability I suppose, but I think for this example it wouldn’t be a problem. I’m working on a larger project now where I have precisely this issue of flexibility vs. readability though.

I’m reengineering a rather gangly application to enable things like pluggable… everything. It talks to networked queues, so the protocol is pluggable. It talks to databases, so the database back end is pluggable, in addition to the actual data processing routines. Enabling this level of flexibility introduces some complexity, and really requires good documentation if we reasonably expect people to work with our code (the project should be released as open source in the coming weeks). Without the documentation, I’m sure I’d have trouble maintaining the code myself!

PyYaml with Aliases and Anchors

Tuesday, December 22nd, 2009

I didn’t know this little tidbit until yesterday and want to get it posted so I can refer to it later.

I have this YAML config file that’s kinda long and has a lot of duplication in it. This isn’t what I’m working on, but let’s just say that you have a bunch of backup targets defined in your YAML config file, and your program rocks because each backup target can be defined to go to a different destination. Awesome, right?

Well, it might be, but it might also just make your YAML config file grotesque (and error-prone). Here’s an example:

        host: foo
        dir: /Users/jonesy
        protocol: ssh
        keyloc: ~/.ssh/
            host: bar
            dir: /mnt/array23/homes/jonesy
            check_space: true
            min_space: 80G
            num_archives: 4
            compress: bzip2
        host: eggs
        dir: /Users/molly
        protocol: sftp
        keyloc: ~/.ssh/
            host: bar
            dir: /mnt/array23/homes/jonesy
            check_space: true
            min_space: 80G
            num_archives: 4
            compress: bzip2

Now with two backups, this isn’t so bad. But if your environment has 100 backup targets and only one destination, or…. heck — even if there are three destinations — should you have to write out the definition of those same three destinations for each of 100 backup targets? What if you need to change how one of the destinations is connected to, or the name of a destination changes, or array23 dies?

Ideally, you’d be able to reference the same definition in as many places as you need it and have things “just work”, and if something needs to change, you just change it in one place. Enter anchors and aliases.

An anchor is defined just like anything else in YAML with the exception that you get to label the definition block using “&labelname”, and then you can (de)reference it elsewhere in your config with “*labelname”. So here’s how our above configuration would look:

BackupDestination-23: &Backup_To_ARRAY23
    host: bar
    dir: /mnt/array23/homes/jonesy
    check_space: true
    min_space: 80G
    num_archives: 4
    compress: bzip2
        host: foo
        dir: /Users/jonesy
        protocol: ssh
        keyloc: ~/.ssh/
        Destination: *Backup_To_ARRAY23
        host: eggs
        dir: /Users/molly
        protocol: sftp
        keyloc: ~/.ssh/
        Destination: *Backup_To_ARRAY23

With only two backup targets, the benefit is small, but keep trying to imagine this config file with about 100 backup targets, and only one or two destinations. This removes a lot of duplication and makes things easier to change and maintain (and read!)

The cool thing about it is that if you already have code that reads the YAML config file, you don’t have to change it at all — PyYaml expands everything for you. Here’s a quick interpreter session:

>>> import yaml
>>> from pprint import pprint
>>> stream = file('foo.yaml', 'r')
>>> cfg = yaml.load(stream)
>>> pprint(cfg)
{'BackupDestination-23': {'check_space': True,
                          'compress': 'bzip2',
                          'dir': '/mnt/array23/homes/jonesy',
                          'host': 'bar',
                          'min_space': '80G',
                          'num_archives': 4},
 'Backups': {'Home_Jonesy': {'Destination': {'check_space': True,
                                             'compress': 'bzip2',
                                             'dir': '/mnt/array23/homes/jonesy',
                                             'host': 'bar',
                                             'min_space': '80G',
                                             'num_archives': 4},
                             'dir': '/Users/jonesy',
                             'host': 'foo',
                             'keyloc': '~/.ssh/',
                             'protocol': 'ssh'},
             'Home_Molly': {'Destination': {'check_space': True,
                                            'compress': 'bzip2',
                                            'dir': '/mnt/array23/homes/jonesy',
                                            'host': 'bar',
                                            'min_space': '80G',
                                            'num_archives': 4},
                            'dir': '/Users/molly',
                            'host': 'eggs',
                            'keyloc': '~/.ssh/',
                            'protocol': 'sftp'}}}

…And notice how everything has been expanded.


Python, PostgreSQL, and psycopg2’s Dusty Corners

Tuesday, December 1st, 2009

Last time I wrote code with psycopg2 was around 2006, but I was reacquainted with it over the past couple of weeks, and I wanted to make some notes on a couple of features that are not well documented, imho. Portions of this post have been snipped from mailing list threads I was involved in.

Calling PostgreSQL Functions with psycopg2

So you need to call a function. Me too. I had to call a function called ‘myapp.new_user’. It expects a bunch of input arguments. Here’s my first shot after misreading some piece of some example code somewhere:

qdict = {'fname': self.fname, 'lname': self.lname, 'dob': self.dob, 'city':, 'state': self.state, 'zip': self.zipcode}

sqlcall = """SELECT * FROM myapp.new_user( %(fname)s, %(lname)s,
%(dob)s, %(city)s, %(state)s, %(zip)s""" % qdict


There’s no reason this should work, or that anyone should expect it to work. I just wanted to include it in case someone else made the same mistake. Sure, the proper arguments are put in their proper places in ‘sqlcall’, but they’re not quoted at all.

Of course, I foolishly tried going back and putting quotes around all of those named string formatting arguments, and of course that fails when you have something like a quoted “NULL” trying to move into a date column. It has other issues too, like being error-prone and a PITA, but hey, it was pre-coffee time.

What’s needed is a solution whereby psycopg2 takes care of the formatting for us, so that strings become strings, NULLs are passed in a way that PostgreSQL recognizes them, dates are passed in the proper format, and all that jazz.

My next attempt looked like this:

curs.execute("""SELECT * FROM myapp.new_user( %(fname)s, %(lname)s,
%(dob)s, %(city)s, %(state)s, %(zip)s""", qdict)

This is, according to some articles, blog posts, and at least one reply on the psycopg mailing list “the right way” to call a function using psycopg2 with PostgreSQL. I’m here to tell you that this is not correct to the best of my knowledge.The only real difference between this attempt and the last is I’ve replaced the “%” with a comma, which turns what *was* a string formatting operation into a proper SELECT with a psycopg2-recognized parameter list. I thought this would get psycopg2 to “just work”, but no such luck. I still had some quoting issues.

I have no idea where I read this little tidbit about psycopg2 being able to convert between Python and PostgreSQL data types, but I did. Right around the same time I was thinking “it’s goofy to issue a SELECT to call a function that doesn’t really want to SELECT anything. Can’t callproc() do this?” Turns out callproc() is really the right way to do this (where “right” is defined by the DB-API which is the spec for writing a Python database module). Also turns out that psycopg2 can and will do the type conversions. Properly, even (in my experience so far).

So here’s what I got to work:

callproc_params = [self.fname, self.lname, self.dob,, self.state, self.zipcode]

curs.callproc('myapp.new_user', callproc_params)

This is great! Zero manual quoting or string formatting at all! And no “SELECT”. Just call the procedure and pass the parameters. The only thing I had to change in my code was to make my ‘self.dob’ into a object, but that’s super easy, and after that psycopg2 takes care of the type conversion from a Python date to a PostgreSQL date. Tomorrow I’m actually going to try calling callproc() with a list object inside the second argument. Wish me luck!

A quick cursor gotcha

I made a really goofy mistake. At the root of it, what I did was share a connection *and a cursor object* among all methods of a class I created to abstract database operations out of my code. So, I did something like this (this is not the exact code, and it’s untested. Treat it like pseudocode):

class MyData(object):
   def __init__(self, dsn): 
      self.conn = psycopg2.Connection(dsn)
      self.cursor = self.conn.cursor()

   def get_users_by_regdate(self, regdate, limit):
      self.cursor.arraysize = limit 
      self.cursor.callproc('myapp.uid_by_regdate', regdate)
      while True: 
         result = self.cursor.fetchmany()
         if not result:
         yield result

   def user_is_subscribed(self, uid): 
      self.cursor.callproc('myapp.uid_subscribed', uid)
      result = self.cursor.fetchone()
      val = result[0]
      return val

Now, in the code that uses this class, I want to grab all of the users registered on a given date, and see if they’re subscribed to, say, a mailing list, an RSS feed, a service, or whatever. See if you can predict the issue I had when I executed this:

    db = MyData(dsn)
    for id in db.get_users_by_regdate([joindate]):
        idcount += 1
        print idcount
        param = [id]
        if db.user_is_subscribed(param):
            print "User subscribed"
            skip_count += 1
            print "Not good"

Note that the above is test code. I don’t actually want to continue to the top of the loop regardless of what happens in production :)

So what I found happening is that, if I just commented out the portion of the code that makes a database call *inside* the for loop, I could print ‘idcount’ all the way up to thousands of results (however many results there were). But if I left it in, only 100 results made it to ‘db.user_is_subscribed’.

Hey, ‘100’ is what I’d set the curs.arraysize() to! Hey, I’m using the *same cursor* to make both calls! And with the for loop, the cursor is being called upon to produce one recordset while it’s still trying to produce the first recordset!

Tom Roberts, on the psycopg list, states the issue concisely:

The cursor is stateful; it only contains information about the last
query that was executed. On your first call to “fetchmany”, you fetch a
block of results from the original query, and cache them. Then,
db.user_is_subscribed calls “execute” again. The cursor now throws away all
of the information about your first query, and fetches a new set of
results. Presumably, user_is_subscribed then consumes that dataset and
returns. Now, the cursor is position at end of results. The rows you
cached get returned by your iterator, then you call fetchmany again, but
there’s nothing left to fetch…

…So, the lesson is if you need a new recordset, you create a new cursor.

Lesson learned. I still think it’d be nice if psycopg2 had more/better docs, though.

Python Quirks in Cmd, urllib2, and decorators

Wednesday, October 28th, 2009

So, if you haven’t been following along, Python programming now occupies the bulk of my work day.

While I really like writing code and like it more using Python, no language is without its quirks. Let me say up front that I don’t consider these quirks bugs or big hulking issues. I’m not trying to bash the language. I’m just trying to help folks who trip over some of these things that I found to be slightly less than obvious.

Python’s Cmd Module and Handling Arguments

Using the Python Cmd module lets you create a program that provides an interactive shell interface to your users. It’s really simple, too. You just create a class that inherits from cmd.Cmd, and define a bunch of methods named do_<something>, where <something> is the actual command your user will run in your custom shell.

So if you want users to be able to launch your app, be greeted with a prompt, type “hello”, and have something happen in response, you just define a method called “do_hello” and whatever code you put there will be run when a user types “hello” in your shell. Here’s what that would look like:

import cmd

class MyShell(cmd.Cmd):
   def do_hello(self):
      print "Hello!"

# Kick off the shell
shell = MyShell()

Of course, what’s a shell without command line options and arguments? For example, I created a shell-based app using Cmd that allowed users to run a ‘connect’ command with arguments for host, port, user, and password. Within the shell, the command would look something like this:

> connect -h mybox -u jonesy -p mypass

Note that the “>” is the prompt, not part of the command.

The idea here is that you pass the arguments to the option flags, and you can set sane defaults in the application for missing args (for example, I didn’t provide a port here — I’m leaning on a default, but I did provide a host, since the default might be ‘localhost’).

Passing just one, single-word argument with Cmd is dead easy, because all of the command methods receive a string that contains *everything* on the line after the actual command. If you’re expecting such an argument, just make sure your ‘do_something’ method accepts the incoming string. So, to let users see what “hello” looks like in Spanish, we can accept “esp” as an argument to our command:

class MyShell(cmd.Cmd):
   def do_hello(self, arg):
      print "Hello! %s" % arg

The problems come when you want more than one argument, or when you want flags with arguments. For example, in the earlier “connect” example, my “do_connect” method is still only going to get one big, long string passed to it — not a list of arguments. So where in a normal program you might do something like:

class MyShell(cmd.Cmd):
   def do_connect(self, host='localhost', port='42', user='guest', password='guest'):
      #...connection code here...

In a Cmd method, you’re just going to define it like we did the do_hello method above: it takes ‘self’ and ‘args’, where ‘args’ is one long line.

A couple of quick workarounds I’ve tried:

Parse the line yourself. I created a method in my Cmd app called ‘parseargs’ that just takes the big long line and returns a dictionary. My specific application only takes ‘name=value’ arguments, so I do this:

         d = dict([arg.split('=') for arg in args.split()])

And return the dictionary to the calling method. My connect method can then check for keys in the dictionary and set things up. It’s longer an a little more arduous, but not too bad.

Use optparse. You can instantiate a parser right inside your do_x methods. If you have a lot of methods that all need to take several flags and args, this could become cumbersome, but for one or two it’s not so bad. The key to doing this is creating a list from the Big Long Line and passing it to the parse_args() method of your parser object. Here’s what it looks like:

class MyShell(cmd.Cmd):
   def do_touch(self, line):
      parser = optparse.OptionParser()
      parser.add_option('-f', '--file', dest='fname')
      parser.add_option('-d', '--dir', dest='dir')
      (options,args) = parser.parse_args(line.split())

      print "Directory: %s" % options.dir
      print "File name: %s" % options.fname

This method is just an example, so don’t scratch your head looking for “import os” or anything :)

This is probably the more elegant solution, since it doesn’t require you to restrict your users to passing args in a particular way, and doesn’t require you to come up with fancy CLI argument parsing algorithms.

Using urllib2 for Pure XML Over HTTP

I wrote a web service client this week that does pure XML over HTTP to send queries to a service. I’ve written things like this before using Python, but it turns out, after looking back at my code, I was always either using XMLRPC, SOAP, or going through some wrapper that hid a lot from me in an effort to make my life easier (like the Google Data API). I’ve never had to try to send a pure XML payload over the wire to a web server.

I figured urllib2 was going to help me out here, and it did, but not before going through some pain due mainly to an odd pattern in various sources of documentation on the topic. I read docs at,, a couple of blogs, and did a Google search, and everything, everywhere, seems to indicate that the urllib2.Request object’s optional “data” argument expects a urlencoded string. From

data should be a buffer in the standard application/x-www-form-urlencoded format

The examples on every site I’ve found always pass whatever ‘data’ is through urllib.urlencode() before adding it to the request. I figured urllib2 was no longer my friend, and almost started looking at implementing an HTTPSClient object. Instead I decided to try just passing my unencoded data. What’s it gonna do, detect that my data wasn’t urlencoded? Maybe I’d learn something.

I learned that all of the documentation fails to account for this particular edge case. Go ahead and pass whatever the heck you want in ‘data’. If it’s what the server on the other end expects, you’ll be fine. :)


I found myself in dark, dusty corners when I had to decide how and where inside of a much larger piece of code to implement a feature. I really wanted to use a decorator, and still think that’s what I’ll wind up doing, but then how to implement the decorator isn’t as straightforward as I’d like either.

Decorators are used to alter how a decorated function operates. They’re amazingly useful, because instead of implementing some bit of code in a bunch of methods that themselves live inside a bunch of classes across various modules, or creating an entire class or mixin to inherit from when you only need the code overhead in a couple of edge cases, you can just create a decorator and apply it only to the proper methods or functions.

The lesson I learned is to try very hard to make one solid decision about how your decorator will work up front. Will it be a class? That’s done somewhat differently than doing it with a function. Will the decorator take arguments? That’s handled differently in both implementations, and also requires changes to an existing decorator class that didn’t used to take arguments. I don’t know why I expected this to be more straightforward, but I totally did.

If you’re new to decorators or haven’t had to dig into them too deeply, I highly recommend Bruce Eckel’s series introducing Python decorators, which walks you through all of the various ways to implement them. Part I (of 3) is here.

Sys/DB Admin and Coder Seeks Others To Build Web “A-Team”

Wednesday, September 9th, 2009

UPDATE: There’s no location requirement. I kind of assume that I’m not going to find the best people by geographically limiting my search for potential partners. :)

Me: Live in Princeton, NJ area. Over 10 years experience with UNIX/Linux administration, databases and data modeling, and PHP/Perl. About 3 years experience using Python for back-end scripting and system automation, and less than a year of Django experience. Former Director of Technology for (it was bought out), Infrastructure Architect at, and systems consultant/trainer. Creator of Python Magazine, former Editor in Chief of both php|architect *and* Python Magazine, and co-author of “Linux Server Hacks, volume 2″ (O’Reilly).

You are one of these:

  • Web graphic designer who has worked on several web-based projects for clients in various industries, understands current best practices and standards, has the tools and experience necessary to create custom graphics, and has some familiarity (secondarily) with PHP and/or Python, Javascript and Ajax. If you regularly make use of table-based web designs or ActiveX controls, this isn’t you.
  • Hardcore web developer with at least 6 years experience doing nothing but web-based projects using Javascript and (at some point) *both* PHP and Python, and has worked with or has an interest in Django, Cake, and other frameworks, and understands that client needs often don’t coincide with the religion of fanboyism. Knowledge of Javascript, Ajax, web standards and security is essential here. If your last “big project” was volunteer work to build a website for your kid’s soccer team, this isn’t you.
  • A generalist webmaster (sysadmin/db admin/scripter) with at least 6 years experience working in production *nix environments with good familiarity in the areas of high availability, web servers (specifically Apache), proxy servers and monitoring, and has worked with/supported users like the ones mentioned above on web-based projects. If you have to look at the documentation to figure out how to implement a 301 redirect, this probably isn’t you.

Experience working on a team in larger projects with multiple people would be good. Note that I’m looking for people to partner with on projects, I’m not hiring full time employees. Future partnership in a proper business is certainly a possibility, but… baby steps! I do have a couple of domains that would be great for use with this kind of project if it ever progresses that far :-)

I know that other people are out there looking for people to partner with on projects, but there doesn’t appear to be a common place for them to interact. Maybe that can be a project we undertake together :)  — if there *is* a place where people meet up for this kind of thing, let me know!

Let’s have fun, and take over the world! Shoot me an email at “bkjones” @ Google’s mail domain.

If You Code, You Should Write

Wednesday, September 9th, 2009

The Practice of Programming

Programmers are, in essence, problem solvers. They live to solve problems. When
they identify a problem that needs solving, they cannot resist the temptation
to study it, poke and prod it, and get to know it intimately. They then start
considering solutions. At this point, the programmer is not often thinking in
code — they’re thinking about the problem using high-level concepts and terms
that most non-programmers would understand.

Consider the problem of how to post a news story to a website. The programmer
might think about the solution this way:

  • Log in
  • Go to ‘new story’ page
  • Enter title and text
  • Press ‘submit’

Of course, there are a million details in between those points, and after them
as well. The programmer knows this, but defers thinking about details until the
higher-level solution makes sense and seems reasonable/plausible. Later in the
process they’ll think about things like the site’s security model, WYSIWYG
editors, tags and categories, icons, avatars, database queries and storage, and
the like.

Once they’ve reached a point where they’re satisfied that their solution will
work and is thoughtful of the major points to be considered in the solution,
they open an editor, and begin to type things that make no sense to their
immediate family. Programmers express their solutions in code, of course, but
they express them nonetheless, and this is not a trivial point.

The Parallels Between Programming and Writing

Writers often take the exact same course as do programmers. Programmers and
writers alike are often given assignments. Assignments take the form of a
problem that needs solving. For a programmer it’s a function or method or class
that needs implementing to perform a certain task. For a writer it’s an article
or column or speech that covers a particular topic. So in these cases, the
problem identification is done for you (not that more discovery can’t be done
— in both cases).

Next is the conception of the solution. Programmers puzzle over the problem,
its context in the larger application or system, its scope, and its complexity.
Writers puzzle over their topic space, its breadth and depth, and its context
in the bigger picture of what their publication tries to accomplish. In both
cases, writer and programmer alike take some time and probably kill some trees
as they attempt to organize their thoughts.

At some point, for both writer and programmer, the time comes to use some tool
to express their thoughts using some language. For a writer, they open a text
editor or word processor and write in whatever language the publication
publishes in. For the programmer, they open an IDE or editor and write using the
standard language for their company, or perhaps their favorite language, or (in
rare cases), the best language for accomplishing the task.

In neither case is this the end of the story. Programmers debug, tweak, and
reorganize their code all the time. Writers do the exact same thing with their
articles (assuming they’re of any length). Both bounce ideas off of their
colleagues, and both still have work to do after their first take is through.
Both will go at it again, both with (hopefully) a passion that exists not
necessarily for the particular problem they’re solving, but for the sheer act
of solving a problem (or covering a topic), whatever it may be.

Finally, once things are reviewed, and all parts have been carefully
considered, the writer submits his piece to an editor for review, and the
programmer submits to a version control system which may also be attached to an
automated build system. Both may have more work to do.

Starting Out

The process is essentially the same. If you’re a new programmer, you can expect
to have more than your fair share of bugs. If you’re a new writer, you can
likewise expect your piece to look a bit different in final form than it did
when you submitted it to the editor.

Just like programming, writing isn’t something you do perfectly from day one.
It’s something that takes practice. At first it seems like an arduous process,
but you get through it. As time passes, you start to realize that you’re going
faster, and stumbling less often. Eventually you get to a point where you can
crank out 1500-2000 words on your lunch hour without needing too much heavy

You Should Write

So, I say “you should write”. As someone who owes his career to books and
articles (not to mention friendly people far more experienced than myself), I
consider it giving back to the medium that launched my career, and helping
others like others helped me. I hope I can make the technological landscape
better in some small way. If we all did that, we’d be able to collectively
raise the bar and improve things together.

If altruism isn’t your bag, or you’re just hurting from the recent economic
crisis, know that it’s also possible to make money writing as well. It’s not
likely to become your sole occupation unless you happen to live in a VW Bus, or
you do absolutely nothing else but write full time, all the time. However, it
can be a nice supplement to a monthly salary, and if done regularly over the
course of a year is more than enough to take care of your holiday shopping

I’ve had good experiences writing for editors at php|architect and Python
Magazine (I *was* an editor at both magazines, but you don’t edit your own
work!), O’Reilly ( and a book as well), (when it was
under the auspices of the OSTG), TUX and Linux Magazine (both now defunct), and
others. I encourage you to go check out the “write for us” links on the sites
of your favorite publications, where you’ll find helpful information about
interacting with that publications editors.

Cool Mac/Mobile Software for Sysadmins, Programmers, and People

Thursday, September 3rd, 2009

I recently upgraded my primary workhorse (a MacBook Pro) to Snow Leopard. Before I did, I decided to go through and take stock of all of the documents and software I’d accumulated. While I was doing this, I simultaneously got into a conversation with a buddy of mine about the software he uses on his Macs. Turns out he maintains a whole page devoted to (mostly non-geek, but still somehow geeky) Mac software he uses.

I decided to go ahead and list the software I use for stuff whether it was geeky or not. Then I realized that pretty much all of the software I use is kinda geeky. I guess if you’re someone who’s going to create a list of software you use, it’s pretty hopeless.

So… here’s what I’m using. Suggestions welcome in the comments!

Social Media

My Twitter account updates my Facebook status. My Brightkite checkins update the location information on my Twitter account. It also sends a tweet… which updates my Facebook status. I pay less attention to the ongoing status in my LinkedIn account, but it gets updated automatically as well, I just don’t remember how or by what anymore.

I’ve tried a bunch of Twitter clients. Tweetie is “good enough”. It’s the one I use most often. If I need something hardcore I use Tweetdeck or TweetGrid, which has the benefit of being web-based.

TwitterLocal lets you put in your location and a radius, and then shows you tweets from people who are discernibly near you. I think Brightkite does a better overall job with this, since its whole reason for being is to be location-aware, but it seems like I get fewer updates than with TwitterLocal.


  • Colloquy
  • Tweetie
  • Mail
  • Skype
  • Google Talk

Right. Twitter is also a communication tool. I have, in fact, checked in with people via Twitter. It’s not how I typically use it, but I think it counts :)

I have to use both Skype and Google Talk because I’m on the road a lot (I’m a consultant) and there are enough hotels who do stupid things with their network that I’m forced to use whichever one works on that particular network. Though I mostly use GMail for mail, it’s gone down a few times on me, so it’s good to have Mail around. I’ve recently found GMail notifier to be almost useless as well, so when I use Mail, I find that getting alerted to incoming messages frees my brain. I use Mail.appetizer to show me previews of incoming mail so I don’t have to switch gears from what I’m doing to see the latest spam. Note, however, that it’s not quite ready for Snow Leopard.

I haven’t tried Mail in Snow Leopard yet. If they ever fix the search functionality (I find it useless) I’ll stop using the GMail interface. I’ve tried thunderbird, but its search is even worse (or was, the last time I tried it).

Fun Stuff

I play guitar and piano, and have also played drums, saxophone, and lots of other noise-making apparatuses. I like that GarageBand will let me put down bass and drum tracks without having to own a bass or drum set.

I also enjoy photography, though I don’t often get out on long quiet hikes in nature or gastronomical adventures that would make for the kinds of stunning things I see on Flickr all the time. However, I do have a family, and we do travel, so while not even 10% of my pics on Flickr are stock quality photos, at least 90% of them are interesting to me personally :)

iPhoto I see as a necessary evil these days. I used to love it, but now that it tries to help me out by autocategorizing on things that, as it turns out, are pretty arbitrary in the context of my life, I don’t like it as much. It’s good for quick touch-ups though. I’ve saved a number of pics with it.

StellaOSX is an Atari 2600 emulator for the Mac that comes with like, I dunno, thousands of ROMs? If you miss your old Atari games, and you have a Mac, it’s all you’ll ever need.

Sim City 4 is a city-building game. If you haven’t heard of Sim City before, it’s not like the Sims. At all. I don’t get that game, in fact. Sim City is a game where you have to try to build a city, build its wealth and prestige, and try to keep the residents happy as well.


Things for Mac is the first application I’ve personally seen that seamlessly syncs with Things for my iPhone. It works great. It’s not a full-blown project management solution, but it’s more than a todo list. It’s not about work-related stuff, either. Things is really about keeping my personal things in order. I have to call the township for an inspection on my recent AC replacement, schedule for a followup doctor visit for my dog, hire an insulation contractor by the fall, send out my quarterly taxes, make a dentist appointment… that kind of stuff. It’s also a great place to put ideas for blog posts and stuff, and since it’s right there on my iPhone, I don’t forget as many ideas anymore. I can’t say enough good things about Things, so I’ll just say go try it.

Google Calendar and iCal are kept in sync, so I don’t have to use the horrifically slow Google Calendar on my iPhone. I can sync to iCal on the desktop, sync that to my iPhone, and use iCal on the phone as well. Why the whole calendar synchronization thing has to *still* be hard after like 4 years of trying is beyond me.


Keynote makes doing things that are hard in PowerPoint and impossible in OpenOffice or Google Docs easy as all getout. As a trainer, I spend a lot of time putting content together and trying to find new ways to make it more engaging, less boring, etc. (not that I’ve been accused of being boring, mind you) 😉

I deliver all of my training from a MacBook Pro using either the remote that came with my laptop or the Remote iPhone application. Usually I can’t use Remote for iPhone because of restrictions regarding the wireless network, but I sometimes use it at home to rehearse new content.

I do use Google Docs for lots of other stuff. It’s not what I’d call full-featured, but when you discover that it’s integrated with Google Talk, it actually makes real-time collaboration pretty nice. Sadly, Microsoft Word is still the only word processing application I’ve seen with offline collaboration features that I’d call “pretty good”. Nothing I’ve seen recently can do what Word did 5 years ago in terms of collaboration. Again — sad.

Preview is a PDF viewer, but it also will do screen grabs. I know there’s a keyboard shortcut to do screen captures. I think it’s shift-command-4. I’m just as happy opening Preview, which is right there on the Dock anyway. It’s better than the old utility Apple provided for this, which would only save in TIFF format.

I feel like people look at me strange when I say that I use a dictionary every single day I’m on the computer (so… every day). I used it for this post, as a matter of fact (“apparatuses” still doesn’t sound right to me). I wish there was an app that could tell you how often you’ve used an app in the last day, week, month, etc. I’ll bet the Dictionary app outnumbers Mail (I usually only use Mail when GMail is down).

System Maintenance

  • Time Capsule/Time Machine
  • AppCleaner
  • Disk Inventory X
  • Apple Remote Desktop

I bought a Time Capsule. It’s an Apple product. It’s an enclosed 1TB hard drive inside of a wireless access point. It also has a USB port where you can connect a hub and then connect up other external USB hard drives, and a USB printer that can then be shared with the whole network without running a long-in-the-tooth Mac G4 with the mirrored doors and the fan that sounds like the landing of the mothership…. uh…. I mean… It’s really easy to use! I use it to back up all of the Macs in the house. The iPhone backs up to my Mac, so that’s covered too.

AppCleaner isn’t horribly useful, but I do use it, and it helps slightly. Maybe. It’s supposed to help you get rid of apps you no longer use, but it still leaves behind seemingly everything that would normally be left behind if you just opened Terminal and typed “sudo rm -rf ./AppName”. I give it the benefit of the doubt. Maybe it catches some stuff sometimes, and then I know all of the usual suspects that hang on to old app cruft, so I can clean some of it out manually without too much fuss.

Disk Inventory X is pretty cool. It presents a tree map view of the contents of your hard drive which makes it dead easy to spot where the disk hogs are. And here I was writing scripts for this 😉 It’s a great spotting tool, but because it’s constantly scraping the disk, it’s quite slow. You also can’t select multiple things in the interface and move them all to the trash at one time, which would be nice. Still, it definitely helped me find stuff I didn’t know was there, and that was taking up lots and lots of space.

Apple Remote Desktop isn’t something I use often, but it’s handy to have around. It lets you do all kinds of advanced stuff by connecting to the desktop of a remote Mac, but I just do simple things with it. If you didn’t know about it, it’s worth at least being aware of.

System Administration/Geekery

  • Terminal
  • Vim
  • SSH Tunnel Manager
  • VMware Fusion
  • Cisco VPN Client

This is the “where do I start” section for me. I do lots of geekery, and these tools facilitate a lot of the geekery. I stuck with the basics here. I use Terminal because tons of what I do is on the command line. There are things I do on the command line for which GUI applications exist, but to be honest, some of those cost money, and none of them are as efficient or reliable as the command line. I know that makes me sound like an old graybeard, but it’s mostly true. A GUI that really makes something you already know how to do on the command line easier is rare.

Vim, of course, runs inside of Terminal. If I’m writing a bunch of code across lots of files or something, I’ll try to use Komodo Edit (and I might upgrade to Komodo IDE), but if I’m on a remote machine, or I just need to do a quick edit here or there, one file at a time, I’ll just use Vim. Vim can do window splitting and code folding and stuff like that, so Komodo isn’t a requirement for me, it’s just slightly more convenient, and it has Vi key bindings :)

SSH Tunnel Manager is a GUI for managing SSH tunnels. Go figure. I’ve been using it for years now, but to be honest, if I don’t use it for a while, the interface becomes unintuitive to me and I go back to the command line or my SSH config file to set up tunnels.

VMware Fusion is great. I can test the latest Linux distros without devoting a whole machine to them, or I can run Windows and test web stuff in IE. There seems to be no end to the stuff I find myself using VMware Fusion for. Surprising.

I’m told there’s a VPN client built into Snow Leopard, but I haven’t tested it out yet. Some have reported issues, so hopefully they don’t bite me.


Komodo Edit is my favorite editor for writing code, period. If it didn’t have Vi keybindings, I’d likely just use Vim. And I do, sometimes. My first-choice language these days is Python, but I still write plenty of PHP, shell, SQL, Perl, etc. The Mac comes with XCode as an optional install, and I should really give it another shot, but in the past I’ve felt that it was kind of overwhelming, not to mention kinda clunky and slow.

Django is a Python web framework that comes with a development stand-in web server so you can do all of your development on the laptop, test it all locally, then push out to some environment that more closely matches production.

Speaking of pushing out changes, I mostly use Mercurial for my own projects nowadays, and I rather like it, but lots of things still use Subversion, which is wildly popular. My open source project actually uses Subversion with Google Code, but Google recently announced Mercurial support for hosted projects, so I’ll need to look at changing that over.

Fabric is a deployment tool. It’s written in Python and uses the paramiko library, which I found interesting, because I’d written a couple of automation scripts using paramiko that would have been easier to do with Fabric. I’ve only done simple things with Fabric so far, but it’s worth a look if you do a lot of rsync-ish stuff, followed by some “ssh in a for loop” stuff, supported by some cron jobs…. Fabric can really ease your life.

VMware Fusion is used in a programming context in two ways: to test web stuff on IE (I have an XP VM), and to work with libraries that are more convenient to work with under Linux than on the Mac. Sometimes Linux distros have things built-in that I’d have to build from source (along with all the dependencies) on the Mac.

Firebug is just basically a necessity if you do any kind of web development. It lets you inspect the design elements on the page visually, as well as in code, which makes debugging your CSS so easy it’s almost a non-event.

So… what tools are you using?

Create a Tagging Index Page with django-tagging

Tuesday, August 11th, 2009

For those not following along, I’ve been recreating using Django. It’s my first Django project that you could call “real work”. I’ve been using the Django documentation, various blogs, and the 2nd edition of “Practical Django Projects”, which has given me a lot of ideas, and a lot of actual code. Some of it worked, some of it didn’t, some of it didn’t do things the way I wanted, and some of it was left as an exercise to the user. This was the case with some of the django-tagging-related stuff, which has been broken on the site for a while.

I finally got tired of tagging not working properly on, so I started diving into the code and found that one of the things I wanted to do was actually pretty darn easy. In the process, I thought of something else I’ll probably implement later as well. Not *all* of my problems are solved, but I’m on my way!

So, Linuxlaboratory is made up of three different sections: a blog, which is its own app, a “Code” area which is another separate app, and a content management system (flatpages right now) that will handle storing republished articles when I get around to importing them all.

I enabled tagging on everything. I’m not solidly in one camp or the other on the whole “Tagging everything is bad” debate. Rather than theorize, I decided to give it a go and see how it does. My guess is that once I add a search box it will rarely actually be used, but what do I know?

The problem this presented me with was trying to figure out a way to present the user with one big monstrosity of an “index lookup” page, which would list all of the tags, and for each tag, list links to anything of any content type that was tagged with it. I understand that this could become unwieldy at some point, but if I need to I suppose I can pretty easily paginate, or present alphabet links, or perhaps both!

Though I understood the potential for future disaster, it still bothered me that I couldn’t find a quick answer to the question, so here it is for those wanting to do something similar with django-tagging. For reference, my tag index page is here.

Template Tags in django-tagging

I had actually started creating a custom template tag, and was looking at the Django docs, which stated “a lot of apps supply custom template tags”. Duh! I cd’d into the tagging directory, and there was a directory called “templatetags”. The file inside was pretty well documented, and the tag I was about to write myself is called ‘tagged_objects’. Here’s the docstring for that tag:

Retrieves a list of instances of a given model which are tagged with
a given “Tag“ and stores them in a context variable.


{% tagged_objects [tag] in [model] as [varname] %}

The model is specified in “[appname].[modelname]“ format.

The tag must be an instance of a “Tag“, not the name of a tag.


{% tagged_objects comedy_tag in tv.Show as comedies %}


Perfect. I already had a tag_list.html template (which, if memory serves, is one of the things left as an exercise to the user in Practical Django Projects), and it listed the tags in use on the site, but instead of linking off to a ‘tag_detail’ page for each tag, I envisioned something more interesting. I’m not there yet, but this index page is step one.

Putting Together a Template

What I needed to do was simply {% load tagging_tags %}, and then call the {% tagged_objects %} tag with the proper arguments, which consist of a tag *object* (not a tag name), the model you want to grab instances of, and a variable name you want to store the instance in. Here’s the content block from my tag_list.html:

{% block content %}
{% for tag in object_list %}
<div id="entry">
   <p>{{ }}</p>
   <ul>   {% load tagging_tags %}
      {% tagged_objects tag in  monk.Entry as taggedentries %}
      {% for entry in taggedentries %}
         <li><a href="{{ entry.get_absolute_url }}">{{ entry.title }}</a></li>
      {% endfor %}
      {% tagged_objects tag in ray.Snippet as taggedsnippets %}
      {% for snippet in taggedsnippets %}
         <li><a href="{{ snippet.get_absolute_url }}">{{snippet.title}}</a></li>
      {% endfor %}
{% endfor %}
{% endblock %}

So, the view I’m using supplies Tag objects in a variable called object_list. For each Tag object, I spit out the name of the tag. Underneath that on the page, for each tag, there’s an unordered list. The list items are the Entries from my “monk” application, and my Snippets from my “ray” application. I hope reading this template along with the bit above from the docstring for the template tag helps someone out. And check out the other tags in as well!

Rome Wasn’t Built in a Day

Of course, there’s still an issue with my particular implementation. Tagging was originally implemented specifically for the entries in the “/weblog/” part of my site. However, now that they’ve been applied to things in the “/snippets/” part of my site, this page doesn’t *really* belong in either one. However, if you go to the page, you’ll see that the “Blog” tab in the navigation bar is still highlighted. I’ll figure out what to do with that at some point. Until then, enjoy, and if you have any input or wisdom to share, please leave comments! Also, you should follow me on twitter!


My Django Project Update: RSS Feed, “Home” Link, and more.

Monday, August 10th, 2009

In continuing the rebuild of using Django, I’m happy to say that things have moved fairly smoothly. I’m using a good mix at this point of stuff from the 2nd edition of “Practical Django Projects”, the Django documentation, blog posts, and docs from other apps I’m making use of.


I said in one of my previous posts that I’d wait until I burned my feed before giving out the link, and I just did that, so if you want to subscribe to the LinuxLaboratory Blog feed, here’s the link to do that. Right now there’s just one feed for all of the blog entries, but since I post almost all of my really geeky stuff here, the LLO Blog will be mostly site updates like new articles, code, or features being added. The LLO Blog isn’t something that’s intended to get tons of traffic or have tons of posts all the time. The meat of the site will be the content management system which houses articles, and the “Snippets” area which will house scripts and hacks and stuff.

The “Home” Link in Django

I’m not sure why, but it took me a little time to figure out how to link to the base site in a Django template. I had some URL routing set up such that, well… here’s what I have:

In the main project’s

(r'^$', include('monk.urls.entries'))

I named my blog app “monk”, after Thelonius Monk. There’s actually a reason I picked his name for a blog app, but it’s not important right now (though, for a chuckle, I picked his last name because his first name breaks a long-standing “8 character” tradition in UNIX).

Anyway, in the corresponding URLConf in monk, I have:

(r'^$', 'archive_index', entry_latest_dict, 'monk_entry_archive_index')

And then in one of my base templates I had this (which is perfectly valid code):

<a href="{% url 'monk_entry_archive_index'}"></a>

The ‘{% url %}’ tag can take a URLConf name as an argument, and it’ll do a reverse lookup to get the URL, which is nice, except that this would always land people at “”, and I wanted them to just go to the base URL for the site. The canonical home page. The root URL. Whatever you want to call it.

There are multiple ways to link back to the base domain from within a template, but I’m not sure if there’s a canonical, “Django-sanctioned” method. You can just make an href pointing to “/”, you can hard-code the whole URL, and I found that doing “href={{ settings.SITE_ID }}” also worked just fine. I tried that last one after discovering that the base URL for the site isn’t in, and reading that SITE_ID was used by some applications to help them figure out their own URL routing. SITE_ID is a numeric value that represents, according to the Django docs, “the current site in the django_site database table”.

That’s a little confusing, but if you just have a look at the table, it starts to become clear how this could work:

mysql> select * from django_site; 
| id | domain              | name                |
|  1 | | | 

It seems logical that using {{ settings.SITE_ID }} in a template could cause the right things to happen, but I haven’t gone diving into the source code I’d need to to prove that it does.

What’s the canonical way of doing this?

Up Next…

So, I have what I think will be a decent setup for code sharing (complete with highlighted syntax), a solid foundation for a blog app, and I’m working on the content management system. I’m using TinyMCE in the admin interface to edit blog posts as well as the CMS content. I’ve got very very basic CSS in place. The basics are here. Now what?

Well, first I need to get my ducks in a row. This includes:

  • Stabilizing a proper development and deployment workflow. There’s a rather nice setup over here, and the 2nd edition of Practical Django Projects has also been enlightening in this regard.
  • Cleaning up my templates. I created the Blog app first, and so of course now I have like 3 separate apps, and I don’t really want them all to have a different look and feel, so I need to abstract some bits and perhaps create a “/templates/common/” directory that will be referenced by all of the apps in the project…? How do you do it?
  • I’d like to get some high-level navigation horizontally across the top instead of having those links in the sidebar. I don’t want fancy popout menus — just a very simple bar where the nav links basically just represent the functions of the different apps: Blog, Articles (CMS), Code, and maybe an About link or something.
  • And more!!

After that stuff is out of the way I’ll start thinking about some new features:

  • I’d like to be able to include images (maybe multiple images) in various types of content. In fact, perhaps all kinds of content. Screenshots for the code snippets, stock images for CMS and blog content, etc. I’m a little intimidated by this because I know my web host (webfaction) limits the amount of memory I can use at any given time. I guess I can just manually scp images to my static media location and link them into the content, but it doesn’t seem ideal. Ideally I can upload them in the same interface where I edit the content, and maybe have the img src tag associated with the instance of the model I’m editing in the database. Does something do this already?
  • I’d really like to have a wordpress-style “Stats” page. Actually, I’d like to have a much better stats page than the wordpress one, but that’d be a start. The stats page I’d actually *like* to have is best described by Marty Allchin here (2 years ago. Anyone know of an app that is aiming for anything close to that?)
  • RSS feeds for the code snippets section (this should be simple)
  • On-the-fly PDF generation for downloading the PDF version of… whatever. A code snippet, an article… Haven’t even investigated this yet.
  • I really, really, really, really want to turn my geek conference calendar into a google maps mashup using GeoDjango. This is another bit I’m slightly intimidated by, because I didn’t realize GeoDjango was built-in these days, and I started out down this road using MySQL as my database for whatever reason (instead of PostgreSQL, which I actually like better and, as it turns out, has way more mature GIS functionality).
  • At some point, I’d like to ensure that the old URLs to content that was on the old site will actually still work, and land people on the same content in the new site. The URL layout isn’t actually horrifically different, so hopefully I can get that in place without *too* much fuss. I know Django has a redirection app built-in, but I’m not sure if this is the right way to do it, or if I should just use Apache rewrite rules. Anyone compared the two?

So those are the big goals. Some are simple, others less so, but I hope to complete all of this within the next… well, before the new baby is born, which is going to be some time in the first half of September. Wish me luck (on both), and please share your tips on how I might accomplish any of the above goals (I’ve heard all the tips I can handle about kids, thanks).