Loghetti 0.9 Released: Now worthy of use!

The first released tarball of Loghetti was called the “IPO” release. This version actually warranted having an actual version number. I chose 0.9, and we’ll be moving toward 0.91, in .01 increments to a 1.0 release. Later on I’ll try to detail a roadmap, but I haven’t had enough feedback for that yet (though I’ve had some feedback, and it’s going to be worked into Loghetti soon).

So why is it worthy of use now? Here are a list of key features in 0.9:

  • It can take input from stdin or from a log file named as an argument.
  • You can write your own output plugin without knowing anything at all about Loghetti’s internals, so doing things like formatting output for MapReduce is Mind Numbingly Easy(tm). An example plugin that formats output for insertion into a database is included in the tarball. You’ll see that there is nothing loghetti-specific in the code except the name of the defined function: munge()
  • A few simple code changes and some lazy evaluation later, Loghetti 0.9 is several times faster than the IPO release, which is nice. It can now serve as a reasonable troubleshooting tool on 250MB log files.
  • Loghetti can report/filter on the key=value pairs in the query string. Passing ‘–urldata=foo:bar’ will return lines where foo=bar in the query string found in the request field.
  • You don’t have to get the whole line back in the output. You can tell Loghetti to return only the fields you want. I’ll document the names of the fields shortly, but for now, you can find them all defined in the apachelogs.py file.
  • And much, much more!

Thanks to Kent Johnson and Doug Hellmann, who signed up and were each a tremendous help both in helping me improve the performance of Loghetti, and teaching me a thing or two along the way.

There is, so far, one outstanding issue that is not yet fixed in 0.9: although I’ve tested Loghetti against several million log lines by now, others have occasionally found that some broken (malicious?) client software causes log lines to be created which do not conform to the Apache ‘combined’ log format. These will (presently) cause Loghetti to exit with an error. This is bad, but apparently is relatively rare. 0.9 does *not* contain a fix for this, because I was unsure which way to go with a solution. At this point, I think that, rather than code for every special case, what might happen is Loghetti will continue processing, and keep lines like this aside in a loghetti.log file, and tell you there were ‘x non-conformant lines’, and to see the log for details. Other ideas on how to deal with this are welcome, of course.