Archive for the ‘Loghetti’ Category

Quick Loghetti Update

Monday, March 15th, 2010

For the familiar and impatient: Loghetti has moved to github and has been updated. An official release hasn’t been made yet, but cloning the repository and installing argparse will result in perfectly usable code. More on the way.

For the uninitiated, Loghetti is a command line log sifting/reporting tool written in Python to parse Apache Combined Format log files. It was initially released in late 2008 on Google Code. I used loghetti for my own work, which involved sifting log files with tens of millions of lines. Needless to say, it needed to be reasonably fast, and give me a decent amount of control over the data returned. It also had to be easy to use; just because it’s fast doesn’t mean I want to retype my command because of confusing options or the like.

So, loghetti is reasonably fast, and reasonably easy, and gives a reasonable amount of control to the end user. It’s certainly a heckuva lot easier than writing regular expressions into ‘grep’ and doing the ol’ ‘press & pray’.

Loghetti suffered a bit over the last several months because one of its dependencies broke backward compatibility with earlier releases. Such is the nature of development. Last night I finally got to crack open the code for loghetti again, and was able to put a solution together in an hour or so, which surprised me.

I was able to completely replace Doug Hellmann’s CommandLineApp with argparse very, very quickly. Of course, CommandLineApp was taking on responsibility for actually running the app itself (the main loghetti class was a subclass of CommandLineApp), and was dealing with the options, error handling, and all that jazz. It’s also wonderfully generic, and is written so that pretty much any app, regardless of the type of options it takes, could run as a CommandLineApp.

argparse was not a fast friend of mine. I stumbled a little over whether I should just update the namespace of my main class via argparse, or if I should pass in the Namespace object, or… something else. Eventually, I got what I needed, and not much more.

So loghetti now requires argparse, which is not part of the standard library, so why replace what I knew with some other (foreign) library? Because argparse is, as I understand it, slated for inclusion in Python 3, at which point optparse will be deprecated.

So, head on over to the GitHub repo, give it a spin, and send your pull requests and patches. Let the games begin!

Useful stuff – 2008 – first half

Friday, July 11th, 2008

Having a Google account is sometimes useful in ways you hadn’t planned for. For example, at a few different employers I’ve been at, I’ve had to prepare for reviews by providing a list of accomplishments to my supervisor. One decent tool for generating this list is email, though it can take some time. Another useful tool is the Web History feature of your Google account.

Though this isn’t necessarily indicative of everything I’ve accomplished in the first half of 2008 per se, it’s definitely indicative of the types of things I’ve generally been into so far this year, and it’s interesting to look back. What does your Web History say?

  • Gearman – this is used by some rather large web sites, notably Digg. It reminds me a little of having Torque and Maui, but geared toward more general-purpose applications. In fact, it was never clear to me that PBS/Maui couldn’t actually do this, but I didn’t get far enough into Gearman to really say that authoritatively.
  • How SimpleDB Differs from a Relational Database – Links off to some very useful takes on the “cloud” databases, which are truly fascinating creatures, but have a vastly different data management philosophy from the relational model we’re all used to.
  • Reblog – I found this in the footer of someone’s blog post. It’s kinda neat, but to be honest, I think you can do similar stuff using the Flock browser.
  • Google Finance APIs and Tools – did I ever mention that I had a Series 7 & 63 license two months after my 20th birthday? I love anything that I can think for very long periods of time about, where there’s lots and lots and LOTS of data to play with, where you can make correlations and answer questions nobody even thought to ask. Of course, soon after finding this page I found the actual Google Finance page, which answers an awful lot of potential questions. The stock screener is actually what I was looking to write myself, but with the data freely available, I’m sure it won’t be long before I find something else fun to do with it. I’m not a fan of Google’s “Feeds” model, but I’ve dealt with it before, and will do it again if it means getting at this data.
  • Bitpusher – it was recommended to me as an alternative to traditional dedicated server hosting. Worth a look.
  • S3 Firefox Organizer – This is a firefox plugin that provides an interface that looks a lot like an FTP GUI or something, but allows you to move files to and from “buckets” in Amazon’s S3 service.
  • Boto – A python library for writing programs that interact with the various Amazon Web Services. It’s not particularly well-documented, and it has a few quirks, but it is useful.
  • OmniGraffle – A Visio replacement for Apple OS X. I like it a lot better than Visio, actually. It has tons of contributed templates. You shouldn’t have any trouble making the switch. A little pricey, but I plunked down the cash, and have not been disappointed.
  • The Python Queue Module according to Doug – Doug Hellmann’s Python Module of the Week (PyMOTW) should be published in dead tree form some day. I happen to have some code that could make better use of queuing if it were a) written in Python, and b) used the Queue module. I was a little put off by the fact that every single tutorial I found on this module assumed you wanted to use threading, which I actually don’t, because I’m not smart enough…. though the last person I told that to said something to the effect of “the fact that you believe that means you’re smart enough”. Heh.
  • MySQL GROUP modifiers – turns out this isn’t what I needed for the problem I was trying to solve, but the “WITH ROLLUP” feature was new to me at the time I found it, and it’s kinda cool.
  • WordPress “Subscribe to Comments” plugin – Baron suggested that it would be good to have this, and I had honestly not even thought about it. But looking around, this is the only plugin of its kind that I found, and it’s only tested up to WP 2.3x, and I’m on 2.5x. This is precisely why I hate plugins (as an end user, anyway. Loghetti supports plugins) 😉
  • Lifeblogging – I had occasion to go back and flip through some of the volumes of journals I’ve kept since age 12, wondering if it might be time to digitize those in some form. I might digitize them, but they will *not* be public I don’t think. Way too embarrassing.
  • ldapmodrdn – for a buddy who hasn’t yet found all of the openldap command line tools. You can’t use ‘ldapmodify’ (to my knowledge) to *rename* an entry.
  • Django graphs – I haven’t yet tried this, because I’m still trying to learn Django in what little spare time I have, but it looks like there’s at least some effort towards this out there in the community. I have yet to see a newspaper that doesn’t have graphs *somewhere* (finance, sports, weather…), so I’m surprised Django doesn’t have something like this built-in.
  • URL Decode UDF for MySQL – I’ve used this. It works really well.
  • Erlang – hey, I’m game for anything. If I weren’t, I’d still be writing all of my code in Perl.
  • The difference between %iowait in sar and %util in iostat – I use both tools, and wanted the clarification because I was writing some graphing code in Python (using Timeplot, which rocks, by the way), and stumbled upon the question. Google to the rescue!
  • OSCON ’08 – I’m going. Are you going? I’m also going to the Oregon Brewers Festival on the last day of OSCON, as I did in ’06. Wonderful!
  • Explosion at one of my hosting providers – didn’t affect me, but… wow!
  • hypertable – *sigh* someday…when there’s time…
  • Small-scale hydro power – Yeah, I’m kind of a DIYer at heart. I do some woodworking, all my own plumbing, painting, flooring, I brew my own beer, I cook, I collect rain in big barrels, power sprinklers using pool runoff to give my lawn a jumpstart in spring… that kind of stuff. One day I noticed water coming out of a downspout fast enough to leap over one of my rain barrels and thought there must be some way to harness that power. Sadly, there really isn’t, so I did some research. It’s non-trivial.
  • You bet your garden – I also do my own gardening and related experiments.
  • RightScale Demo – WATCH YOUR VOLUME – a screencast showing off RightScale’s features. Impressive considering the work it would take me, a lone admin, to set something like this up. The learning curve involved in effectively/efficiently managing/scaling/monitoring/troubleshooting EC2 is non-trivial.
  • Homebrew Kegerator – Maybe if this startup is bought out I can actually afford this thing to put my homebrewed beer in. The 30-year-old spare fridge in the basement is getting a little… gamey.
  • The pound proxy daemon – I use this. It works well enough, but I’ve crashed it under load, too. I’ve also had at least one hosting provider misconfigure it on my behalf, and I had to go and tell them how to fix it :-/
  • Droid Sans Mono – a fantastic coding font. Installing this font is in my post-install routine for all of my desktops.
  • Generator tricks for systems programmers – David Beazley has made available a lot of Python source code and presentation slides from what I imagine was a great talk (if you’re a systems guy, which I am).
  • The Wide Finder Saga – I found this just as I was writing Loghetti. There are still some things in Mr. Lundh’s code that I haven’t implemented, but it was a fantastic lesson.
  • Using gnu sort for IP addresses – I’ve used sort in a lot of different ways over the years… but not for IP addresses. This is a nice hack for pulling this off with sort, but it doesn’t scale very well when you have millions of them, due to the sort utility’s ‘divide and conquer’ method of sorting.
  • Writing an Hadoop/MapReduce Program in Python – this got me over the hump.
  • Notes on using EC2/S3 – This got me over some other small humps
  • BeautifulSoup – found while searching for the canonical way to screen scrape with Python. I’d done it a million times in Perl, and you can do it with httplib and regex and stuff in Python if you want, but this way is at least a million times nicer.

Well, that’s a decent enough summary I guess. As you can see, I’ve been doing a good bit of Python scripting. Most of my code these days is written in Python instead of Perl, in part because I was given the choice, and in part because Python fits my brain and makes me want to write more code, to push myself more. I’ve also been dealing with things involving “cloud” computing and “scalability” — like Hadoop, and EC2/S3. I haven’t done as much testing of the Google utility computing services, but I’ve used their various APIs for some things.

So what’s in your history?

Spring Means Blooming Flowers… and Ideas

Monday, April 21st, 2008

I seem to have found a pattern in my own internal workings. In the fall, I work furiously and get a lot done. Around the time of the winter holidays, I almost always do major personal web site changes and upgrades according to a mental list I’ve compiled over the previous year.

In the spring, I shake off the winter (I’m not a fan of winter), I brew my first batch of beer for the season (which symbolizes the end of winter, because I brew outdoors), and my brain starts to be flooded with new ideas. They range from the simplistic (maybe we should consider replacing windows in the house this year), to the slightly odd (why isn’t there a bluetooth setup that pairs two devices and alerts you if they get out of range, so if my daughter strays too far…), to the really useful (I should really take on that woodworking project to build that bookcase we desperately need), to the GEEKY!

This year I seem to be having a lot of geeky ideas. The difference is that, this year, I finally feel empowered enough to go after some of them. One idea that has come up is building an online brewer’s workshop. I would just build a GUI to do this for myself, but then I’d have to deal with which widget set to use, which platforms to support, and whatever else. Also, the final step in the evolution of a lot of GUIs is webification anyway. So I *think* this might be a job for Python, and I *think* I might try to do this using Django, which is fully supported by my web host (finally – see yesterday’s post)!

Brewing is one of those things that you can make as complex as you care to get. I started brewing with a buddy using a Coleman picnic cooler, a few buckets, and some odds and ends from the kitchen. Now I have a full three keg system, with pumps, plate chillers (small plate heat exchangers), fancy false bottoms, cool valves and tubing, and it involves relatively little manual labor. And that complexity can infect recipe development as well. Hops add bitterness by leeching alpha acids into the wort (the liquid that is not yet beer). Hop utilization calculations can be non-trivial and depend on many other factors in your system. Other characteristics depend heavily on the percent of available sugars you’re able to extract from the grains, your ability to keep a mash at a given temperature for a fixed period of time. This is easier to predict if you know, for example, the thermal mass of the vessels involved, and how much heat will be lost when you combine water and grain and stir. There are also proteins at work in the mash which can gum things up enough to make draining the liquid off a chore, so knowing what water/grain ratio to use is also important. And how quickly can you bring wort from boiling down to a temperature more friendly to yeast at the end of the cycle?

That’s a small fraction of the considerations you *could* make when brewing. I didn’t even touch on pH and water characteristics, or yeast attenuation! Needless to say, brewing with any consistency would be a great challenge and take a good bit more preparation without some tool to help you figure out how much water you’ll need, how many ounces of hops for how long, and how much grain you need to mash (and for how long), etc. There are lots of tools to help brewers out with this kind of stuff (ProMash is a popular one). The problem I have is that these tools are mostly commercial, proprietary, platform-specific ventures. I’d like to put one on the web that is at least “good enough”, and free for anyone to use. I’m open source that way (I’m happy to release the source as well).

Another tool I’d love to see is one that would let me manage my consulting business online. If BestPractical’s RT had a good PayPal plugin that would let you charge per ticket or charge for a bundle of so many tickets or something, that’d be a good start, but I’ve mucked with the code for RT (it’s written mostly in Perl), and it wasn’t a pleasant experience. This wouldn’t be a complete solution either, because most of my work is *not* simple support tickets, it’s large projects. For those I’d like people to be able to pay invoices online. There’s lots more I’d like to add on top of that, but that’s the general gist of it, and in the past I’ve been unable to find a really good solution, where “really good” is a completely nebulous term barely defined in my own head. 🙂

In addition to those ideas, I registered a couple of domains over the past year, and I hope to do some cool things with them as well if I ever get some time away from work and consulting. Oh yeah – I’ll also continue working on loghetti! Keep any eye out for updates. Maybe some people reading this have similar interests and would like to collaborate. Ciao for now!

Loghetti 0.9 Released: Now worthy of use!

Saturday, March 29th, 2008

The first released tarball of Loghetti was called the “IPO” release. This version actually warranted having an actual version number. I chose 0.9, and we’ll be moving toward 0.91, in .01 increments to a 1.0 release. Later on I’ll try to detail a roadmap, but I haven’t had enough feedback for that yet (though I’ve had some feedback, and it’s going to be worked into Loghetti soon).

So why is it worthy of use now? Here are a list of key features in 0.9:

  • It can take input from stdin or from a log file named as an argument.
  • You can write your own output plugin without knowing anything at all about Loghetti’s internals, so doing things like formatting output for MapReduce is Mind Numbingly Easy(tm). An example plugin that formats output for insertion into a database is included in the tarball. You’ll see that there is nothing loghetti-specific in the code except the name of the defined function: munge()
  • A few simple code changes and some lazy evaluation later, Loghetti 0.9 is several times faster than the IPO release, which is nice. It can now serve as a reasonable troubleshooting tool on 250MB log files.
  • Loghetti can report/filter on the key=value pairs in the query string. Passing ‘–urldata=foo:bar’ will return lines where foo=bar in the query string found in the request field.
  • You don’t have to get the whole line back in the output. You can tell Loghetti to return only the fields you want. I’ll document the names of the fields shortly, but for now, you can find them all defined in the file.
  • And much, much more!

Thanks to Kent Johnson and Doug Hellmann, who signed up and were each a tremendous help both in helping me improve the performance of Loghetti, and teaching me a thing or two along the way.

There is, so far, one outstanding issue that is not yet fixed in 0.9: although I’ve tested Loghetti against several million log lines by now, others have occasionally found that some broken (malicious?) client software causes log lines to be created which do not conform to the Apache ‘combined’ log format. These will (presently) cause Loghetti to exit with an error. This is bad, but apparently is relatively rare. 0.9 does *not* contain a fix for this, because I was unsure which way to go with a solution. At this point, I think that, rather than code for every special case, what might happen is Loghetti will continue processing, and keep lines like this aside in a loghetti.log file, and tell you there were ‘x non-conformant lines’, and to see the log for details. Other ideas on how to deal with this are welcome, of course.

Hadoop, EC2, S3, and me

Thursday, March 20th, 2008

I’m playing with a lot of rather large data sets. I’ve just been informed recently that these data sets are child’s play, because I’ve only been exposed to the outermost layer of the onion. The amount of data I *will* have access to (a nice way of saying “I’ll be required to wrangle and munge”) is many times bigger. Someone read an article about how easy it is to get Hadoop up and running on Amazon’s EC2 service, and next thing you know, there’s an email saying “hey, we can move this data to S3, access it from EC2, run it through that cool Python code you’ve been working with, and distribute the processing through Hadoop! Yay! And it looks pretty straightforward! Get on that!”

Oh joyous day.

I’d like to ask that people who find success with Hadoop+EC2+S3 stop writing documentation that make this procedure appear to be  “straightforward”. It’s not.

One thing that *is* cool, for Python programmers, is that you actually don’t have to write Java to use Hadoop. You can write your map and reduce code in Python and use it just fine.

I’m not blaming Hadoop or EC2 really, because after a full day of banging my head on this I’m still not quite sure which one is at fault. I *did* read a forum post that someone had a similar problem to the one I wound up with, and it turned out to be a bug in Amazon’s SOAP API, which is used by the Amazon EC2 command line tools. So things just don’t work when that happens. Tip 1: if you have an issue, don’t assume you’re not getting something. Bugs appear to be fairly common.

Ok, so tonight I decided “I’ll just skip the whole hadoop thing, and let’s see how loghetti runs on some bigger iron than my macbook pro”. I moved a test log to S3, fired up an EC2 instance, ssh’d right in, and there I am… no data in sight, and no obvious way to get at it. This surprised me, because I thought that S3 and EC2 were much more closely related. After all, Amazon Machine Images (used to fire up said instance) are stored on S3. So where’s my “s3-copy” command? Or better yet, why can’t I just *mount* an s3 volume without having to install a bunch of stuff?

This goes down as one of the most frustrating things I’ve ever had to set up. It kinda reminds me of the time I had to set up a beowulf cluster of about 85 nodes using donated, out-of-warranty PC hardware. I spent what seemed like months just trying to get the thing to boot. Once I got over the hump, it ran like a top, but it was a non-trivial hump.

As of now, it looks like I’ll probably need to actually install my own image. A good number of the available public images are older versions of Linux distros for some reason. Maybe people have orphaned them and gone to greener pastures. Maybe they’re in production and haven’t seen a need to change them in any way. I’ll be registering a clean install image with the stuff I need and trudge onward.

The Power of Open Source

Wednesday, March 19th, 2008

I think my very favorite aspect of the open source development model is that it allows me to practice the philosophies I use in my every day personal life, and apply them to software development as well. In my teens and early 20’s I read quite a lot of Aristotle and Plato, and a very major philosophy that I took away from all of that reading is “be conscious of your own ignorance”. And so I am.

There are just about a million reasons to start an open source project. In the case of loghetti, I made it a project because I know that there are things that other people know, which I do not know, but would probably like to know or benefit from knowing (we’ll not go into epistemological discussions – I’m just going to use the word “know” in the traditional sense here) 😉

Turns out, just knowing that there’s stuff out there that I don’t know has proven useful. Within hours of launching the Google Code site for the project, Kent Johnson joined the project, changed maybe 5 lines of code in the module, and according to my testing, that change resulted in a 6x speed increase. If you’re using loghetti from the SVN trunk, it’s gone from being sluggish for anything over 50MB, to being pretty darn quick even up to 250MB, at least for simple queries like –code=404 (which is what I do speed comparisons with). The changes will be in a tarball probably some time next week, for those who don’t want to use svn.

We haven’t even touched threading yet 😉

Loghetti is now an open source project

Tuesday, March 18th, 2008

I was getting feedback about loghetti, and it was all very useful, and it’s still coming in, and I can’t work full-time on it. At the same time, I’d love for some of the stuff I’ve read about to be implemented, because I certainly could make use of it myself.

So if anyone is interested, you can get loghetti, get more info about loghetti (it’s an apache log filter written in Python), or join the project here.

Feedback and Boredom Result in 35% Performance Boost for Loghetti

Friday, March 14th, 2008

Well, I got some feedback on my last post, and I had some time on my hands tonight, and Python is pretty darn easy to use.

As a result, loghetti is making great strides in becoming a faster log filter. To test the performance in light of the actual changes I’ve made, I’m asking loghetti only to filter on http response code, and I’m only asking for a count of matching lines. I’m only asking for the response code because I happen to know that it will cause loghetti to skip a lot of processing which once was done per-line on every run, but which is now done lazily, on an as-requested basis. So, for example, there’s no reason to go parsing out dates and query strings (two costly operations when you’re dealing with large log files) if the user just wants to query on response codes.

Put another way “Hey, I only want response codes, why should I have to wait around while you process dates and query strings?”

So, here’s where I was when this little solo-sprint started:

strudel:loghetti jonesy$ time ./ --code=404 --count 1MM.log
Matching lines: 10096

real 5m52.103s
user 5m35.196s
sys 0m3.214s

Almost 6 minutes to process one million lines. For reference, that “1MM.log” file is 246MB in size.

Here’s where I wound up as of about 5 minutes ago:

strudel:loghetti jonesy$ time ./ --code=404 --count 1MM.log
Matching lines: 10096

real 3m53.350s
user 3m50.498s
sys 0m1.641s

Hey, looky, there! I even got the same result back. Nice!

Ok, so it’s not what you’d call a ‘speed demon’, especially on larger log files. But testing with a 25MB log with 100k lines in it using the same arguments took 25 seconds, and at that point it’s at least usable, and I’m actually going to be using it to do offline processing and reporting, and it’ll be on a machine larger than my first-generation Intel MacBook Pro, and for that type of thing this works just fine, and it’s easier to run this than to sit around thinking about regular expressions and shell scripts all day.

I’m still not pleased with the performance – especially for simple cases like the one I tested with. I just ran a quick ‘grep | wc -l’ on the file to get the same exact results and it worked in about one half of one second! Sure, I don’t mind trading off *some* performance for the flexibility this gives me, but I still think it can be better.

For now, though, I think I might rather support s’more features, like supporting a comparison operator other than “=”, or specifying ranges of dates and times.

Loghetti Beta – An Apache Log Filter

Thursday, March 13th, 2008

I’m thinking about just making this an open source project hosted someplace like Google Code or something, because there are folks much smarter than myself who can probably do wonders with the code I’ve put together here. Loghetti takes an Apache combined format access log and a few options as arguments, throws your log lines through a strainer, and leaves you with the bits you actually *want* (kinda like spaghetti, but for logs) 😉

It’s written in Python, and the two dependencies it has are included in the tarball at the bottom. The dependencies are an altered version of Kevin Scott’s file (I’ve added more granular log line parsing), and Doug Hellmann’s CommandLineApp, which really made creating a CLI application a breeze, since it handles things like autogenerating options, help output, etc automatically without me having to mess with optparse.

So right now, I use it for offline reporting on what’s in my log files, and it’s great for that. I can run, for example:

./ –code=500 access.log

And get a listing of the log lines that have an http response code of 500. You can get fancier of course:

./ –ip= –urldata=foo:bar –month=1 –day=31 –hour=16 access.log

And that’ll return lines where the client IP is, with the date specified using the date-related options. The “–urldata” option allows you to filter log lines on the query string part of the URL. So, in the above case, it’ll match if you have something like “&foo=bar” in the query string of the URL.

There are tons of features I’d like to support, but before I do, I feel compelled to address its performance on large log files. Once you throw this at a log file greater than about 50MB, it’s not a great real-time troubleshooting tool. I believe I’d be better off ripping some of the parsing out of and making it conditional (for example, don’t bother parsing out all of that date information if the user hasn’t asked to filter on it).

Anyway, it’s still useful as it is, so let me know your thoughts on this, and if it’s something you have a use for or would like to help out with, I’ll set up a project for it. For now, you can Download Loghetti

What I learned about Python Today – eval()

Tuesday, March 11th, 2008

I was writing some Python yesterday, and came across an issue that I thought was going to send me back to the drawing board.

I was using a module that, given an Apache access log, returns line objects with the fields of the line as attributes of the line object. It was certainly usable as-is, but I wanted more granular parsing of the fields, and if there were query string arguments (like “?f0o=bar&page=stories”, etc), I wanted those broken up for easy access later as well.

I created a simple ruleset builder so I could pass arguments to my script and have them become rules that would filter the log and return the interesting bits according to the ruleset. So now we have two objects: the line object that has attributes like line.ip, and a rule object that only has three attributes: the attribute of the line you want, and the value of that attribute you want to filter on – and there’s also a comparison operator attribute, but right now it only holds an “=”. It’s a work in progress 🙂

So this means that you can do something like this:

getattr(line, rule.attr)

And if you passed “ip=” on the command line, then rule.attr will be “ip”, and the above will be parsed as “line.ip”, and you’ll get the expected result.

This fails for any attribute of “line” that isn’t a simple string – anything that has to be parsed as some kind of an expression. Like, say, references to keys and elements of dictionaries. I used cgi.parse_qs to parse my query string so I could access the different keys and values of the query string, and filter my logs using site-specific things like “zip=10016” or something. Of couse, cgi.parse_qs returns a dictionary, which I called “urldata”. So, if you want to filter on “line.urldata[‘zip’][0]”, you should just find a way to assign that to rule.attr, right?

Wrong. getattr doesn’t work that way. It doesn’t look up the dictionary element, it just tags it onto the end of “line” and hopes for the best. It doesn’t evaluate expressions. If you wanted to get an element of a dictionary that is an attribute of “line” using getattr(), you’d have to do this:

getattr(line, rule.attr)['zip'][0]

Where “rule.attr” is just “urldata”.

Well that stinks for my purposes, because I don’t want a given type of argument passed in by the user to cause a special case in my code. I was thinking of alternative models to do all of this, but as usual, Doug had an answer right off the top of his head. His ability to do that sickens me at times. ;-P

The solution was to replace getattr(line, rule.attr) with eval(rule.attr, line.__dict__)

In this case, rule.attr = “urldata[‘zip’][0]’, but it’s not treated as “just a string”. In the case above, “line.__dict__” is a namespace used to search for and evaluate “urldata[‘zip’][0]”. The beauty of this is that as long as the value of rule.attr is defined in line.__dict__, rule.attr can be any argument of any type, and this one line of code will handle it.

That worked wonderfully.