Archive for the ‘Database’ Category

What Ordinary Users Think About IE: Debunked

Wednesday, December 17th, 2008

Point all of your chain-mail-forwarding family and friends at this post. It’s a collection of things people have said to me, or that I’ve overheard, that reveal little tidbits about what people are thinking when they use IE.

I have to use IE – it’s my internet!

IE is not your internet. IE is what’s known as a web browser. There are lots of different web browsers. IE just happens to be the one that comes with Windows. It doesn’t make it a good browser or anything. It’s just there in the event that you have no other browser. If the only browser on your system is IE, the first thing you should do is use it to download Firefox by clicking here.

If IE is so horrible, how come everyone uses it?

They don’t, actually. There was a time not too long ago where over 90% of internet users used IE. However, with the constant flood of security issues (IE usage really should be considered dangerous at this point), IE’s horrible support of web standards (which makes it hard for web developers to create cool sites for you to use), and its inability to keep up with really cool features in modern browsers, its share of the internet usage market has been declining steadily over the last couple of years. In fact, this source puts IE usage at around 45% currently, so not even a majority of people use IE anymore, if statistics are to be believed. Accurate statistics for browser use are difficult to nail down, and are probably more useful to discern a trend, not hard numbers. Still, the usage trend for IE is moving downward, steadily, and not particularly slowly. If you’re still using IE, you’re almost a dinosaur. Just about the entire tech-savvy world has migrated over to Firefox, with small contingents choosing Safari (Mac only) and Chrome (Windows only). Very small camps also use Opera and Konqueror.

This is also not to be trusted, but it’s my opinion based on observation of the IT field over the past 10 years: of the 40% of people still using IE, probably half of them are forced to use it in their offices because they don’t have the proper permissions on their office computers to install anything else. The other half probably just don’t realize they have any choice in the matter. You do. There are other browsers. I’ve named a few in this post. Go get one, or three, of them.

Will all of the sites I use still work?

It has always been exceedingly rare that a web site actually *requires* IE in order to work properly. Your online banking, email, video, pictures, shopping, etc., will all still work. The only time you might need IE around is to use the Microsoft Update website. In all likelihood, you’ll be much happier with your internet experience using something like Firefox than you ever were with IE. Think about it this way: I’m a complete geek. I use the internet for things ordinary users didn’t even know you could do. I bank, shop, communicate, manage projects, calendars and email, registered and run my business completely online. It’s difficult to think of a task that can be done on the internet that I don’t use the internet for, and I haven’t used IE in probably 8 years, and have not had any issues. If you find a web site that absolutely, positively CANNOT be used UNLESS you’re viewing it with IE, please post it in the comments, and I’ll create a “hall of shame” page to list them all, along with alternative sites you can access WITHOUT IE, which probably provide a better service anyway :)

I’m not technical enough to install another browser.

Who told you that?! That’s silly. You installed Elf Bowling didn’t you? C’mon, I know you did. Or what about that crazy toolbar that’s now fuddling up your IE window? Or those icons blinking down near the clock that you forgot the purpose of. At some point, you have installed something on your computer, and it was, in all likelihood, harder to do than installing Firefox would be. It’s simple. You go here, click on the huge Firefox logo, and it presents you with super-duper easy instructions (with pictures!) and a download. It takes less than 3 minutes to install, and you DO NOT have to know what you’re doing in any way or be geeky in any way to install it. If you can tell whether you’re computer is turned on or not, you’re overqualified to be a professional Firefox installer.

I Like IE. I have no problems with IE.

Whether you realize it or not, you have problems with IE, believe me. I had a cousin who said he had no problems with IE too. Then he came to my house one day, knocked on my door, and when I opened it, he handed me a hard drive from his computer. He said that all of his pictures of his first-born child were on there, and his computer had contracted a virus, and he couldn’t even boot from the hard drive. So it was up to me to recover the only pics he had of his only son being born. True story. Turns out, I tracked down the virus on the hard drive, and it was contracted by IE. Also, it wasn’t the only virus he had. If you think you’re safe because you have antivirus software, you’re sadly mistaken. He had it installed too, but it hadn’t been updated in 6 months, so any viruses released since the last update weren’t recognized by the antivirus software, and were allowed to roam freely onto his hard drive.

There has never, in the history of browsers, been a worse track record with regards to security than IE. Never. I promise – but you’re free to Google around for yourself. Half of the reason antivirus software even exists is purely to protect IE users (though email viruses are a problem independent of what browser you use, admittedly).

The other reason you might say you like IE is because you’ve never used anything else. As an alternative, I strongly suggest giving Firefox a shot.

Why do you care what browser I use?

I’m a technology guy. I’m one of those people that would work with technology even if he wasn’t being paid. Some people care about cooking, or quilting, or stained glass, or candlemaking, or knitting, or sewing, or horticulture, or wine. Heck, my mom cares about every single one of those things! Me, I care about technology, and I care about the internet. I want the internet to be a better place. Browsers play a non-trivial role in making the internet a better place. Also, one reason I care about technology is that it helps people do things they might otherwise be unable to do. Browsers enable users to do great things, and it allows us developers to make great things available to you. But when countless hours are spent trying to make things work with IE, it just slows everything down, and you don’t get cool stuff on the internet nearly as fast as you could.

So, it’s less about me caring what browser you use. In fact, I don’t really care if you use Firefox or not, it just happens to be the best browser out there currently. If you want to try something completely different, I encourage that too. It’s more about me caring about technology, the internet, and your browsing experience.

Open Source Technology US Conference Calendar

Tuesday, December 16th, 2008

One of the best ways to keep up with your field and network at the same time is to attend conferences. It’s one of the things I look forward to every year. After learning that O’Reilly has decided to commit blasphemy and *not* hold OSCON in Portland, Oregon the same week as the Oregon Brewers Festival, I was inspired to look around at what other conferences I might attend in 2009. Turns out, this is a huge pain in the ass, because I can’t find a single, central place that lists all of the conferences I’m likely to be interested in.

So… I created a public Google Calendar. It’s called “US Technical Conferences”. It needs more conferences, but I’ve listed the interesting ones I found. In order to keep the calendar from getting overwhelmingly crowded, I’ve decided that conferences on the list should:

  • Deal with open source technology in some way. This is purposely broad.
  • Be at least 3 days in length

If you want something added to the calendar, I’d be delighted to know about more conferences, so leave a comment! If you want to subscribe to the calendar, it’s public – the xml feed is here, and ical is here.

How Are You Staffing Your Startup?

Monday, December 15th, 2008

I have, in the past, worked for startups of varying forms. I worked for a spinoff that ultimately failed but had the most awesome product I’ve ever seen (neural networks were involved, need I say more?), I helped a buddy very early on with his startup, which did great until angel investors crept in, destroyed his vision, and failed completely to understand the Long Tail vision my buddy was trying to achieve, and I worked for a web 2.0 startup which was pretty successful, and was subsequently purchased… by another startup!

Working in academia for 6 years also exposed me to people who are firing up businesses, or projects that accidentally become businesses, and some of those go nowhere, while others seem to be on the verge of NYSE listing now, while a year ago they were housed in the smallest office I’ve ever seen, using lawn furniture for their workstations.

Of course, I’ve also consulted for, and been interviewed by, a host of other startups – recently, even.

First, the bad news

The bad news is that most or all of these startups are headed by developers, and they have applied *only* dev-centric thinking to their startup. They’ve thought about how to solve all of the app-level issues, mapped out use cases, drawn up interfaces, hacked together prototypes, and done all kinds of app-level work. Then, they’ve hired more developers. Then more after that.

Some seem to have given almost zero consideration to the fact that their application might become successful, and its availability might become quite critical. They haven’t given much thought to things like backups or disaster recovery. They have no plan for how to deploy their application such that when it comes time to scale, it has some hope of doing so without large amounts of downtime, or huge retooling efforts.

They’ve also given very little thought to how to enable their workforce to communicate, access their applications and data remotely without huge security compromises, and generally provide the back end system services necessary to run a business effectively (though, admittedly, most startups don’t require much in the way of things like NFS, or even internally-hosted email in the very beginning).

In short, they’ve either assumed that systems folks’ jobs are so easy that it can be handled by the developers, or they think that scalability lives entirely in their code, or they’re just not thinking about system-level aspects of their application at all. And don’t even get me started about the databases I’ve seen.

I know of more than one startup, right now, months late in going live. None of them, at the time I spoke to them, had a systems person on staff, or a deployment plan that addressed things that a systems person would address. What they had was a lot of developers and a deadline. Epic fail.Yes, even if you use agile development methodologies.

The Good

The good news is that, while some companies hire no systems folks at all and flounder around system-related issues forever, others hire at least one or two good systems folks, and make them a part of a collaborative effort to solve systems problems in interesting ways, utilizing the knowledge and experience of both systems and development personnel to create truly unique solutions. When sysadmins and developers collaborate to solve these issues, I have learned that they can create things that will blow your mind (in a good way).

In fact, Tom Limoncelli wrote recently that systems administration needs more PhDs. Well, I suppose that would help, but I think we’d get really far, really fast, if we could just break down some of the walls between sysadmins and developers, give them a common goal, and let them hash it out. Sysadmins have an understanding of the system and network-related issues that developers aren’t likely to have. Developers, in most cases, can probably write better code, much more quickly, than a sysadmin. Developers and sysadmins working together, sharing their knowledge and communicating with each other, can solve systems problems in new, unique, creative, and very effective ways.

The End

In the end, issues facing startups now blur the line between development and system administration a bit more than in the past. There are problems that need solving for which there is no RPM or Deb package. These problems require some knowledge of how related or analogous problems have been solved in the past. A knowledge of the systems and development approaches that have worked, and why. Enough experience to have seen where and when things go bad, and why. It also requires creative and critical thinking. I think that good, senior systems and development people have these skills, and much more.

For whatever reason, it seems that the only time these two camps ever meet is on opposite sides of a deployment or application support problem. Perhaps this happens with enough frequency for people to think that the two camps can’t, or won’t, work together. They can, and they will. People with a love for technology will come together if the common goal involves furthering technology and its use, regardless of their background. Sure, it takes proper management like any other project, but it can be done.

If you’ve had experiences, good or bad, with dev/sysadmin collaboration, I’d love to hear your stories!

Help me pick a new feed reader

Friday, November 21st, 2008

I’ve been using Google Reader since it was created. I really love the *idea* of Google Reader. I like that scrolling through the posts marks them as read. I like that you can toggle between list and expanded views of the posts. I like that you can search within a feed or across all feeds (though selecting multiple specific feeds would be great).

All of that said, I’d like to explore other avenues, because I don’t like that there’s, like, zero flexibility in how the Google Reader interface is configured. My problem starts with large fonts…

I use relatively large fonts. If you increase the font twice up from the default size in firefox on a mac (using the cmd-+ keystroke, twice), and you have more than just a couple of feeds, you wind up with this really horrible side pane with the bottom half of it requiring a scroll bar, and the text wraps, and it just looks terrible. What makes this really REALLY REALLY annoying is that:

  1. I don’t use the features included in the *top* part of the side pane, ever, at all (like ‘trends’ and stuff), and
  2. You can’t resize or disable that part of the side pane.

I’ve used folders and some other features to try to alleviate the issue, but it’s just a compromise, and I’d rather not do that if something else would work better for me. I’ve had a couple of quick glances at just a couple of other readers, but I thought I’d get some input from the lazyweb to see what your thoughts are. Is there a browser-based feed reader that has some of Google’s niceties, but perhaps with a little bit nicer/more configurable interface? Out of curiosity, are you using a Mac-compatible fat-client reader that just totally r0cks in some way? If so, let me know in the comments.

MySQL Problem and Solution Posts: r0ck.

Tuesday, November 18th, 2008

Taming MySQL is… challenging. Especially in very large, fast-growth, ‘always-on’ environments. It’s one of those things where you seemingly can never know all there is to know about it. That’s why I really like coming across posts like this one from FreshBooks that describes a very real problem that was affecting their users, how they dealt with it, why *that* failed, and what the final fix was. Post a link to your favorite MySQL Problem and Solution post in the comments (oh yeah, and “subscribe to comments” should be working now!)

I’m a Top 25 Geek Blogger… for some value of “Top”

Monday, November 10th, 2008

I’m not someone who wakes up every day and looks at how my blog is ranked by all of the various services. I check out my WordPress stats, but that’s really about it. However, someone went and did some of the work for me, and they’ve decided that, of the blogs that they read or that were suggested to them, this blog ranks #20 in a listing of 25.

I’m really flattered, but wonder if it’s an indicator that this is a quality blog, or that they should aim higher in their blog reading ;-P  Either way, listing 25 bloggers in a flattering way is a fantastic marketing technique, because most of us are probably egomaniacal enough to say “Hey! Look!” and link back to the list on *your* blog, resulting in lots of traffic. Kudos, and thanks Mobile Maven!

Stop Doing Things That Don’t Work (a.k.a: Excel and Virtual Private Servers are Evil)

Wednesday, October 29th, 2008

Note that I’m talking about using these tools in some kind of professional way, and more specifically, I’m talking about using Excel as a database, and using VPS hosting to host “professional” web sites. By “professional”, I mean something other than your personal blog, picture gallery, or other relatively inconsequential site.

Excel is not a database

Here’s the thing: Excel isn’t a database. Most people who don’t work in IT don’t seem to understand this, and they’re deathly afraid to actually communicate with anyone in IT, so they take matters into their own hands, and create problems so big that IT is forced to get involved, because at some point this spreadsheet becomes “critical” to some business function. Then IT gets even more bitter toward the non-IT folk, validating some of the reasons the non-IT folk went that route in the first place, and virtually guaranteeing that they won’t come to the IT group next time either.

So, if you don’t work in IT and are not a geek, know this: Excel is not a database. Excel is not meant to manage data on a long-term basis. For everything you can do with Excel, there is almost certainly a better tool for the job. This isn’t to say that Excel is good for *nothing*, just that it’s generally not good in places where data needs to be managed over the longer term, shared with others, and relied upon for day-to-day operations of a business or department.

Find someone in IT who seems nice and “deals with databases”, and ask them what their thoughts are on the topic. Then tell them the *actual problem you’re trying to solve*, and ask how they would approach it. You’re not likely to hear “Excel” in the reply unless Excel is so rampant in your company that it’s become a corporate standard for creating data fiefdoms, which would be bad.

A VPS is Not “Professional Grade”. Ignore Adverts to the Contrary

No, really – I mean it. I’ve done plenty of consulting for companies who need some kind of fire put out for one of their web sites. Not long into the conversation I learn (for about 50% of the calls I get) that the site is externally hosted on a VPS. Occasionally I get people whose sites are, or are supposed to be, hosted on dedicated servers, but the actual VPS/dedicated server isn’t really the whole issue. The issue is with how these things are configured, and your ability to do what you need with them.

Marketing for VPS and dedicated server hosting often say “full root access” somewhere in the list of features. There are also specs like the CPU speed, amount of RAM, and bandwidth limits. All of these come together to give the unwitting customer the notion that they’re getting full root access to some kind of behemouth server with all kinds of resources. However, things go downhill when you see things like cPanel, Plesk, or anything else that looks like “easy management through web-based administrative interface”. Again, this is probably fine for something that gets 100 hits per month or so and isn’t critical. The minute you can attach a cost to the problems that can arise with your site, you need to ditch these hosting plans.

Why? There are numerous reasons, but I’ll start with three:

  1. There’s typically no failover or “high availability”: if one machine goes down, or one VPS on the same hardware goes nuts, you’ve just ceased to exist on the internet at all.
  2. The CPU and RAM advertised is used mostly by the bloated software used to automate the management and monitoring of the systems (in other words, it’s used by your hosting provider, not your own application).
  3. The system configurations I’ve seen in these environments borders on retarded, and since the end user is managing all of this through a web interface, the only folks left to blame are the providers. So when you have problems, they’re guaranteed to be extra-challenging to solve.

What kinds of system configuration issues? Well, how about every service turned on, every port open (and not filtered) by default? How about downright broken service configurations, ranging from named.conf (DNS) configs specifying features that *can’t* work as configured, to crippled package management tools that disallow package modifications because doing so would break the monitoring/management tools, to php.ini files that turn on displayErrors and turn *off* log_errors. In general, logging configurations are poor or worse, making problem-solving an uphill battle. Every time I log into a VPS I am typically shocked and appalled at what I find. Even if it’s $5 a month, it’s not worth it.

Think about it: if you have a VPS and you have database corruption, what happens? You call support, who will probably just confirm or deny that actions performed by them or their automated routines had anything to do with the corruption (if they were forced to uncleanly reboot the machine, for example, that might explain things). Usually, they’ll say they don’t have any record of any events on the server that might be an issue, and you’ll need to fix it yourself (that’s what you wanted “full root access” for in the first place, right?).

So, you get a system or database guy to look into things. He’ll find that there are no logs, broken configurations, and when he tries to make a change, it’s either overwritten by these wacky automated management routines, or it breaks some part of the web-based management interface. He’ll also find that, while your web site uses about 128MB of the 512MB of available RAM, the host is running software that takes up double that amount of RAM. Wow, what a deal you got!

All of these issues, by the way, can also occur on dedicated servers, but what sets VPS services apart is the performance: it is, at the very best, unpredictable, and often bad. Some hosts try to market their way around this by charging you more money for “low-density” VPS “solutions”. Don’t buy it. It’s not a density issue. Even if you only share the hardware that runs your VPS with *one* other VPS, if that other VPS goes crazy and starts performing huge amounts of disk reads and writes, your site, even if there are only 3 people looking at it, is going to be slow.

The solution? Well, evaluate whether or not you really need the control a VPS gives you. If you’re just running WordPress, a simple CMS, or a brochure web site, you almost certainly don’t need a VPS. Get a web hosting plan. They often offer one-click installations of wordpress and CMSes like PostNuke, PHP-Nuke, Joomla, Drupal, etc, along with phpMyAdmin for doing database operations. LinuxLaboratory.org runs on Drupal and MySQL, and houses a bunch of articles I’ve written about Linux, System/DB Administration, etc., that I’ve written over the years. It also presents a feed of the content on this blog, and it’s been running on a simple, cheap, web hosting plan for probably 7 or 8 years now. My uptime is better than the sites of friends of mine who decided they needed the control of a VPS. Same goes for this blog (though it’s a different provider). Heck, my beer blog runs on a *free* web hosting solution at DreamHost. It’s not super fast, but aside from that it serves its purpose well, and they have one-click installations for just about everything.

If you need to launch some kind of site that requires things not offered by a web hosting plan, then chances are you’re developing the site, or have some budget or staff for helping you setup/manage/troubleshoot the services you’ll run there. Check out Amazon EC2 and Google AppEngine, and look into dedicated hosting to see if any of those meet your needs.

If you have an IT department, you could, of course, try to work with them on a solution. This is almost always the best solution over the long haul.

Generating Reports with Charts Using Python: ReportLab

Wednesday, October 22nd, 2008

UPDATE (Mar. 26, 2010) Just realized I never posted the link to the PDF the code here generates: here it is. My bad.

I’ve been doing a little reporting project, and I’ve been searching around for quite some time for a good graphing and charting solution for general-purpose use. I had come across ReportLab before, but it just looked so huge and convoluted to me, given the simplicity of what I wanted at the time, that I moved on. This time was different.

This time I needed a lot of the capabilities of ReportLab. I needed to generate PDFs (this is not a web-based project), I needed to generate charts, and I wanted the reports I was generating to contain various types of text objects in addition to the charts and such.

I took the cliff-dive into the depths of the ReportLab documentation. I discovered three things:

  1. There is quite a lot of documentation
  2. ReportLab is quite a capable library
  3. The documentation actually defies the simplicity of the library.

It’s a decent bit easier than it looks in the documentation, so I thought I’d take you through an example. This example is dead simple, but I still think it’s a little more practical than what I was able to find. The ReportLab documentation refers to what sounds like a great reference example, but the problem is that the tarball I downloaded didn’t contain the files it was making reference to :(

I started out by investigating one of the small example projects in the “demo” directory of the ReportLab directory. It was called “gadflypaper” (Ironically, written by Aaron Watters. I worked in the cube outside of his office for several months last year — Hi Aaron!). Aaron’s example was very simple, and a great starting point to start understanding how to put together a very basic document. It’s not infested with abstractions — just a few simple functions, and a lot of text. I ripped out a lot of the text until I had just an example of each function in action, and then set to work.

The Basic Process

To simplify the work of doing page layout minutiae, I (like the example) used PLATYPUS, which is built into ReportLab and abstracts away some of the low-level layout details. If you *want* low-level control, however, you can do whatever you want with the pdfgen module, also included (and PLATYPUS is basically a layer built from it).

With PLATYPUS, you get access to a bunch of prebuilt layout-related objects, representing things like paragraphs, tables, frames, and other things. You also have access to page templates, so that dealing with things like frame placement is a little easier.

So, to give you a rundown of the high-level steps:

  1. Choose a page template, and use it to create a document object.
  2. Create your “flowables” (paragraphs, charts, images, etc), and put them all into a list object. In ReportLab documentation, this is often referred to as a list named “story”
  3. Pass the list object to the build() method of the document object you created in step 1.

Phase 1: Let’s Get Something Working

As a first phase, let’s just make sure we can do the simplest of documents. Here’s some code that should work if you have a good installation of ReportLab (I’m using whatever was the latest version in early October, 2008.) Note that we’ll be cleaning this up and simplifying it as we go along.

#!/usr/bin/env python

from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.rl_config import defaultPageSize
from reportlab.lib.units import inch

PAGE_HEIGHT=defaultPageSize[1]
styles = getSampleStyleSheet()
Title = Paragraph("Generating Reports with Python", styles["Heading1"])
Author = Paragraph("Brian K. Jones", styles["Normal"])
URL = Paragraph("http://protocolostomy.com", styles["Normal"])
email = Paragraph("bkjones +_at_+ gmail.com", styles["Normal"])
Abstract = Paragraph("""This is a simple example document that illustrates how to put together a basic PDF with a chart.
I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab.""", styles["Normal"])

Elements = [Title, Author, URL, email, Abstract]

def go():
   doc = SimpleDocTemplate('gfe.pdf')
   doc.build(Elements)

go()

Not a lot of actual code here. It’s mostly variable assignments. The variables are mostly just strings, but because I want to control how they’re arranged, I need to make them “Flowables”. Remember that PLATYPUS puts together a document by processing a list of Flowable objects and drawing them onto the document. So all of our strings are “Paragraph” objects. You’ll note, too, that Paragraph objects can be styled using definitions accessed from getSampleStyleSheet, which returns a ‘style object’. If you create one of these at the Python interpreter, and call the resulting object’s ‘list()’ function, you’ll see what styles are available, and you’ll also see what attributes each style has. Try running this code to make sure things work. Change the strings if you like :)

Phase 2: Simple Cleanup

I haven’t yet created insane layers of abstraction in my own code, because I’ve been working on deadlines and doing things that are relatively simple. This will inevitably change :)  However, there are some things you can do to make life a bit simpler and cleaner.

#!/usr/bin/env python

from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.rl_config import defaultPageSize
from reportlab.lib.units import inch

PAGE_HEIGHT=defaultPageSize[1]
styles = getSampleStyleSheet()
Title = "Generating Reports with Python"
Author = "Brian K. Jones"
URL = "http://protocolostomy.com"
email = "bkjones@gmail.com"
Abstract = """This is a simple example document that illustrates how to put together a basic PDF with a chart.
I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab."""
Elements=[]
HeaderStyle = styles["Heading1"]
ParaStyle = styles["Normal"]
PreStyle = styles["Code"]

def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3):
    s = Spacer(0.2*inch, sep*inch)
    Elements.append(s)
    para = klass(txt, style)
    Elements.append(para)

def p(txt):
    return header(txt, style=ParaStyle, sep=0.1)

def go():
    doc = SimpleDocTemplate('gfe.pdf')
    doc.build(Elements)

header(Title)
header(Author, sep=0.1, style=ParaStyle)
header(URL, sep=0.1, style=ParaStyle)
header(email, sep=0.1, style=ParaStyle)
header("ABSTRACT")
p(Abstract)

go()

So, this is still simple. Simplistic, even. All I did was move the repetitive bits to functions. The ‘header’ and ‘p’ functions are (for now) unaltered from the gadflypaper demo. The good part here is that strings can be defined as ‘just strings’. Paragraphs and headers are just plain old string variables, and then at the bottom I just call the ‘header’ and ‘p’ functions and pass in the variables. The order in which I call the functions determines the order my document will appear in.

Phase 3

There’s kind of an issue with the way these functions work, at least for my needs. The problem is that they just go ahead and add things to the “Elements” list automagically. This might be ok for some quick and dirty tasks, but in my case I found that I needed more control. Things were crossing page boundaries where I didn’t want them to, and if I want to add formatting or apply built-in functionality, I can’t do it on a per-object basis without loading up the argument list.

I also wanted to have a relatively easy way to move *sections* of reports around, where a section might consist of a heading, a paragraph, and a source code listing — three different “Flowable” objects. So I altered these functions to make them return flowables instead of just adding things to the Elements list for me:

#!/usr/bin/env python

from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.rl_config import defaultPageSize
from reportlab.lib.units import inch

PAGE_HEIGHT=defaultPageSize[1]
styles = getSampleStyleSheet()
Title = "Generating Reports with Python"
Author = "Brian K. Jones"
URL = "http://protocolostomy.com"
email = "bkjones@gmail.com"
Abstract = """This is a simple example document that illustrates how to put together a basic PDF with a chart.
I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab."""
Elements=[]
HeaderStyle = styles["Heading1"]
ParaStyle = styles["Normal"]
PreStyle = styles["Code"]

def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3):
    s = Spacer(0.2*inch, sep*inch)
    para = klass(txt, style)
    sect = [s, para]
    result = KeepTogether(sect)
    return result

def p(txt):
    return header(txt, style=ParaStyle, sep=0.1)

def pre(txt):
    s = Spacer(0.1*inch, 0.1*inch)
    p = Preformatted(txt, PreStyle)
    precomps = [s,p]
    result = KeepTogether(precomps)
    return result

def go():
    doc = SimpleDocTemplate('gfe.pdf')
    doc.build(Elements)

mytitle = header(Title)
myname = header(Author, sep=0.1, style=ParaStyle)
mysite = header(URL, sep=0.1, style=ParaStyle)
mymail = header(email, sep=0.1, style=ParaStyle)
abstract_title = header("ABSTRACT")
myabstract = p(Abstract)
head_info = [mytitle, myname, mysite, mymail, abstract_title, myabstract]
Elements.extend(head_info)

code_title = header("Basic code to produce output")
code_explain = p("""This is a snippet of code. It's an example using the Preformatted flowable object, which
                 makes it easy to put code into your documents. Enjoy!""")
code_source = pre("""
def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3):
    s = Spacer(0.2*inch, sep*inch)
    para = klass(txt, style)
    sect = [s, para]
    result = KeepTogether(sect)
    return result

def p(txt):
    return header(txt, style=ParaStyle, sep=0.1)

def pre(txt):
    s = Spacer(0.1*inch, 0.1*inch)
    p = Preformatted(txt, PreStyle)
    precomps = [s,p]
    result = KeepTogether(precomps)
    return result

def go():
    doc = SimpleDocTemplate('gfe.pdf')
    doc.build(Elements)
    """)
codesection = [code_title, code_explain, code_source]
src = KeepTogether(codesection)
Elements.append(src)
go()

So, this isn’t too bad. It’s still functional programming. I’ll revamp it in another post to use objects, but for those readers who are still learning all of this, it might help to leave out the abstraction for now. What I liked about the gadflypaper demo was that it was quick and dirty. You could read it line by line, top to bottom, and understand what just happened without jumping back and forth between main() code and object code.

As you can see, I’m using the KeepTogether() method, in two different ways. In the functions, I use it so I don’t have to go back later and manually add spacer elements to the Elements array. Then, toward the bottom, I create a preformatted code snippet, and I use the KeepTogether method to make sure that all parts in the code section stay together without flowing across a page boundary. There are other options you can use to customize how your document deals with ‘orphan’ and ‘widow’ elements as well, so definitely check out the documentation for that (or keep reading this blog. i’ll get to it eventually).

So what’s left?

Phase 4: The Grand Finale

The rest of the code I add is to connect to a database, make a query, and then pass the data returned from the database to a function that creates a chart. I add the chart to the Elements, and we’re in business!

#!/usr/bin/env python
import MySQLdb
import sys
import string
from reportlab.graphics.shapes import Drawing
from reportlab.graphics.charts.linecharts import HorizontalLineChart
from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.rl_config import defaultPageSize
from reportlab.lib.units import inch

dbhost = 'localhost'
dbname = 'httplog'
dbuser = 'jonesy'
dbpasswd = 'mypassword'

PAGE_HEIGHT=defaultPageSize[1]
styles = getSampleStyleSheet()
Title = "Generating Reports with Python"
Author = "Brian K. Jones"
URL = "http://protocolostomy.com"
email = "bkjones@gmail.com"
Abstract = """This is a simple example document that illustrates how to put together a basic PDF with a chart.
I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab."""
Elements=[]
HeaderStyle = styles["Heading1"]
ParaStyle = styles["Normal"]
PreStyle = styles["Code"]

def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3):
    s = Spacer(0.2*inch, sep*inch)
    para = klass(txt, style)
    sect = [s, para]
    result = KeepTogether(sect)
    return result

def p(txt):
    return header(txt, style=ParaStyle, sep=0.1)

def pre(txt):
    s = Spacer(0.1*inch, 0.1*inch)
    p = Preformatted(txt, PreStyle)
    precomps = [s,p]
    result = KeepTogether(precomps)
    return result

def connect():
   try:
      conn1 = MySQLdb.connect(host = dbhost, user = dbuser, passwd = dbpasswd, db = dbname)
      return conn1
   except MySQLdb.Error, e:
      print "Error %d: %s" % (e.args[0], e.args[1])
      sys.exit (1)

def getcursor(conn):
   cursor = conn.cursor()
   return cursor

def totalevents_hourly(rcursor):
    rcursor.execute("""select hour, count(*) as hits from hits group by hour;""")
    return rcursor

def graphout(catnames, data):
    drawing = Drawing(400, 200)
    lc = HorizontalLineChart()
    lc.x = 30
    lc.y = 50
    lc.height = 125
    lc.width = 350
    lc.data = data
    catNames = catnames
    lc.categoryAxis.categoryNames = catNames
    lc.categoryAxis.labels.boxAnchor = 'n'
    lc.valueAxis.valueMin = 0
    lc.valueAxis.valueMax = 1500
    lc.valueAxis.valueStep = 300
    lc.lines[0].strokeWidth = 2
    lc.lines[0].symbol = makeMarker('FilledCircle') # added to make filled circles.
    lc.lines[1].strokeWidth = 1.5
    drawing.add(lc)
    return drawing

def go():
    doc = SimpleDocTemplate('gfe.pdf')
    doc.build(Elements)

mytitle = header(Title)
myname = header(Author, sep=0.1, style=ParaStyle)
mysite = header(URL, sep=0.1, style=ParaStyle)
mymail = header(email, sep=0.1, style=ParaStyle)
abstract_title = header("ABSTRACT")
myabstract = p(Abstract)
head_info = [mytitle, myname, mysite, mymail, abstract_title, myabstract]
Elements.extend(head_info)

code_title = header("Basic code to produce output")
code_explain = p("""This is a snippet of code. It's an example using the Preformatted flowable object, which
                 makes it easy to put code into your documents. Enjoy!""")
code_source = pre("""
def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3):
    s = Spacer(0.2*inch, sep*inch)
    para = klass(txt, style)
    sect = [s, para]
    result = KeepTogether(sect)
    return result

def p(txt):
    return header(txt, style=ParaStyle, sep=0.1)

def pre(txt):
    s = Spacer(0.1*inch, 0.1*inch)
    p = Preformatted(txt, PreStyle)
    precomps = [s,p]
    result = KeepTogether(precomps)
    return result

def go():
    doc = SimpleDocTemplate('gfe.pdf')
    doc.build(Elements)
    """)
codesection = [code_title, code_explain, code_source]
src = KeepTogether(codesection)
Elements.append(src)

hourly_title = header("Hits logged, per hour")
hourly_explain = p("""This shows aggregate hits across a 24-hour period. """)

conn = connect()
cur = getcursor(conn)
te_hourly = totalevents_hourly(cur)
catnames = []
data = []
values = []
for row in te_hourly:
   catnames.append(str(row[0]))
   values.append(row[1])

data.append(values)
hourly_chart = graphout(catnames, data)
hourly_section = [hourly_title, hourly_explain, hourly_chart]
Elements.extend(hourly_section)

go()

So, I’ve muddied things up a bit. If you’ve written database code before, you can just look past it all. I don’t do anything magical there. In fact, the chart creation isn’t magical either. I’m sure there’s even a cleaner way to do it – but this works for the moment.

I get a connection object, use it to get a cursor, then pass the cursor to the query function, which passes back…. a query object: te_hourly. The chart I’m going to create needs ‘category’ names for the y-axis values, and then values to plot on the chart. In my case, the hour is row[0] and the total hits for that hour are in row[1]. I build my catnames and data lists, and then create “hourly_chart” by passing my lists to the graphout function. Finally, I add the chart, along with its title and explanation to the Elements list. Done!

For its part, the graphout function is mostly just a bunch of parameters I need to configure my HorizontalLineChart object. Once the chart is all set to go, I need to add it onto my Drawing object, and return the Drawing flowable object.

Not yet what I’d call “Beautiful Code”, but it works, and it’s likely to help some other folks wade through the ‘getting started’ hump with ReportLab. Hope it was useful.

Clone a table in MySQL without killing your server

Thursday, October 9th, 2008

So, I recently ran into one of those situations where a customer complains that his MySQL database is slow, and “it worked great until about two weeks ago”. The first thing I look at in these situations is not the queries or any code, but the indexes. Once I determined that the indexes were almost nonsensical, I had a look at the queries themselves, which proved that the indexes were, in fact, nonsensical. Running the queries as written in the code, from a mysql shell, with EXPLAIN, I was able to further confirm that the indexes (most of them, anyway) were never used.

Easy right? Just reindex the table!

NOTE: I’m going to skip the talk about all of the database design issues on this gig and just go straight to solving the immediate problem at hand. Just know that I had nothing to do with the design.

But, supposing this table has 15 million rows and is running on a machine with 2GB of RAM and only a  single (well, mirrored) drive? It will take forever to reindex that table, and the machine can’t be made completely unavailable at any time, either by driving the load up so high that the Linux OOM killer reaps the mysql process, or by putting a lock on the table for, oh, 3 days or so :-)

The solution is to create a brand new table, which I’ll call “good” using the “SHOW CREATE TABLE” output from the bad table, which I’ll call “bad”. I do this right in the shell, actually. I run “SHOW CREATE TABLE bad”, cut and paste the output, remove the part that defines the indexes, rename the table to “good”, and bam! New, empty table.

Of course, you still have to define your new indexes, so run whatever statements are needed to do that. You might even want to populate it with some test data (say, 10000 rows from the bad table) to test your new indexes are being used as expected (or that they can be by altering the query and getting back the same results… but faster).

Once done, it’s time to start copying rows from bad to good. I’ve written a shell script to help with that part. It’s designed to run on a Linux host running MySQL.

The variables at the top of the script are pretty self-explanatory, except to note that there are separate NEWDB and OLDDB variables in case your new table also lives in a new database. The INCREMENT is the number of rows you want to copy over at a time. If you set INCREMENT to 1000, it’ll copy 1000 rows, check the load average, and if it’s under MAXLOAD, it’ll copy over another 1000 rows. It also keeps track of the number of rows in each database as it goes, since writes are still happening on the bad table while this is going on.

So here’s my nibbler script, in shell. I would’ve written it in Python, but it wasn’t my machine, and I couldn’t install the python mysql module :-/

#!/bin/bash

###
### Written by Brian K. Jones (bkjones@gmail.com)
### 
### Takes an increment, old db, and new db, and copies rows from olddb to newdb. 
### Along the way, it'll check system load and sleep if it's too high. 
### There's too much hard-coding right now, but it's a useful template, and 
### has been tested. The script takes no CLI arguments. 
###

INCR=10000
NEWDB=shiny
OLDDB=busted
OLDTABLE=bad
NEWTABLE=good
MAXLOAD=3
DBUSER=mydbuser
DBPASS=mydbpass

rows_old=`mysql -N -D ${OLDDB} -u ${DBUSER} -p${DBPASS} -e "select count(*) from ${OLDTABLE}"`
echo "rows_old is now ${rows_old}" 

rows_new=`mysql -N -D ${NEWDB} -u ${DBUSER} -p${DBPASS} -e "select count(*) from ${NEWTABLE}"`     ## num. rows in new table
echo "rows_new is now ${rows_new}" 

for (( rows_new=$rows_new; rows_new < $rows_old; rows_new+=$INCR )); do
        if [ $((rows_old - (rows_new + INCR))) -gt 0 ]; then         ## Check to see if there are at least $INCR rows left to copy over. 
            mysql -N -D ${NEWDB} -u ${DBUSER} -p${DBPASS} -e "INSERT INTO ${NEWTABLE} SELECT * FROM ${OLDDB}.${OLDTABLE} LIMIT ${rows_new},${INCR}"
      
            while [ "`awk -v max=${MAXLOAD} '$1 > max {print "TRUE"}' /proc/loadavg`" = "TRUE" ]; do 
               echo "sleeping due to load > ${MAXLOAD}"
               sleep 30
            done
            # we update rows_old because it'll be growing while this script runs. 
            rows_old=`mysql -N -D ${OLDDB} -u ${DBUSER} -p${DBPASS} -e "select count(*) from ${OLDTABLE}" `
            rows_new=`mysql -N -D ${NEWDB} -u ${DBUSER} -p${DBPASS} -e"select count(*) from ${NEWTABLE}"`
            time=`date +%R`
            echo "${time} -- rows_new = ${rows_new}, rows_old = ${rows_old}"  

        else                           ## There are < $INCR rows left. Select remaining rows. 
            remaining=$((rows_old - rows_new))
            mysql -N -D ${NEWDB} -u ${DBUSER} -p${DBPASS} -e "INSERT INTO ${NEWTABLE} SELECT * FROM ${OLDDB}.${OLDTABLE} LIMIT ${rows_new},${remaining}" 
            echo "All done!" 
            exit
        fi
done

A merger, migration, mysql, python, and more news

Tuesday, September 30th, 2008

First, AddThis.com (where I was the director of IT) and Clearspring have merged! A side effect of that is that I’m now (happily, on purpose, by choice) a full-time consultant! I’ll have a web site up soonish. Until then, check back here for updates. If you’re a tech firm who needs help, and don’t mind remote workers, send mail to bkjones at Google’s mail service (.com).

Some folks thought I’d passed away due to the uncharacteristic lull in posting frequency on this blog. I’m very much alive — but working for a startup and maintaining a consulting business simultaneously is hard, especially when two large projects fall into your lap at the same time. So what have I been up to?

Well, as part of the now-public merger between the company I worked for and the new company, I was doing the migration of our infrastructure to theirs. That involved rewriting some backup scripts, writing a data synchronization routine (complete with backout capability — I’ll post code samples after I clean out all the site-specific stuff — it’s python and MySQLdb!), set up a different (and kinda cool) MySQL replication scheme, a different (and also kinda cool) failover scheme, test, test, test, coordinate with everyone down in VA at the new company to make sure everything was working and in place, and then “flip the big ol’ switch”.

Now I’ll be writing even more scripts, planning even more migration of infrastructure services components, doing more testing, and retiring old AddThis assets.

I’m happy to say that the folks at Clearspring have been an awesome team to work with. The culture there is kind of like “nobody here is any better than anybody else”, and it has a rather dramatic effect on productivity compared to other places where everyone thinks they’re not allowed to be wrong and have to preserve their jobs and make other people look stupid. Nope, if you’re a junior guy or someone from another department with a good idea, and it works, that’s great! It’ll be poked and prodded at, and if it passes muster, it’ll be deployed. And why shouldn’t it be? Generally, because egos get in the way. In my book, if you don’t stand in the way of something great, you’re more of a hero than if you do. The ideas don’t all have to be yours.

And, get this, they document their stuff! And it’s easy to use and browse! I’m on a documentation project right now for another client, and it’s challenging just to get people to talk about documentation. :-/

So what’s next for me? I’m honestly not sure at the moment. I’m consulting, it’s going well, I have more than enough work, but I also have a little bit of ‘the bug’, so don’t be surprised if I come back here and tell you that I’ve joined some “social multimedia web 2.0 mindmapping in the cloud with sharing” company or something like that in the near future.