I was writing some Python yesterday, and came across an issue that I thought was going to send me back to the drawing board.
I was using a module that, given an Apache access log, returns line objects with the fields of the line as attributes of the line object. It was certainly usable as-is, but I wanted more granular parsing of the fields, and if there were query string arguments (like “?f0o=bar&page=stories”, etc), I wanted those broken up for easy access later as well.
I created a simple ruleset builder so I could pass arguments to my script and have them become rules that would filter the log and return the interesting bits according to the ruleset. So now we have two objects: the line object that has attributes like line.ip, and a rule object that only has three attributes: the attribute of the line you want, and the value of that attribute you want to filter on – and there’s also a comparison operator attribute, but right now it only holds an “=”. It’s a work in progress 🙂
So this means that you can do something like this:
getattr(line, rule.attr)
And if you passed “ip=192.168.1.2” on the command line, then rule.attr will be “ip”, and the above will be parsed as “line.ip”, and you’ll get the expected result.
This fails for any attribute of “line” that isn’t a simple string – anything that has to be parsed as some kind of an expression. Like, say, references to keys and elements of dictionaries. I used cgi.parse_qs to parse my query string so I could access the different keys and values of the query string, and filter my logs using site-specific things like “zip=10016” or something. Of couse, cgi.parse_qs returns a dictionary, which I called “urldata”. So, if you want to filter on “line.urldata[‘zip’][0]”, you should just find a way to assign that to rule.attr, right?
Wrong. getattr doesn’t work that way. It doesn’t look up the dictionary element, it just tags it onto the end of “line” and hopes for the best. It doesn’t evaluate expressions. If you wanted to get an element of a dictionary that is an attribute of “line” using getattr(), you’d have to do this:
getattr(line, rule.attr)['zip'][0]
Where “rule.attr” is just “urldata”.
Well that stinks for my purposes, because I don’t want a given type of argument passed in by the user to cause a special case in my code. I was thinking of alternative models to do all of this, but as usual, Doug had an answer right off the top of his head. His ability to do that sickens me at times. ;-P
The solution was to replace getattr(line, rule.attr) with eval(rule.attr, line.__dict__)
In this case, rule.attr = “urldata[‘zip’][0]’, but it’s not treated as “just a string”. In the case above, “line.__dict__” is a namespace used to search for and evaluate “urldata[‘zip’][0]”. The beauty of this is that as long as the value of rule.attr is defined in line.__dict__, rule.attr can be any argument of any type, and this one line of code will handle it.
That worked wonderfully.