UPDATE (Mar. 26, 2010) Just realized I never posted the link to the PDF the code here generates: here it is. My bad.
I’ve been doing a little reporting project, and I’ve been searching around for quite some time for a good graphing and charting solution for general-purpose use. I had come across ReportLab before, but it just looked so huge and convoluted to me, given the simplicity of what I wanted at the time, that I moved on. This time was different.
This time I needed a lot of the capabilities of ReportLab. I needed to generate PDFs (this is not a web-based project), I needed to generate charts, and I wanted the reports I was generating to contain various types of text objects in addition to the charts and such.
I took the cliff-dive into the depths of the ReportLab documentation. I discovered three things:
- There is quite a lot of documentation
- ReportLab is quite a capable library
- The documentation actually defies the simplicity of the library.
It’s a decent bit easier than it looks in the documentation, so I thought I’d take you through an example. This example is dead simple, but I still think it’s a little more practical than what I was able to find. The ReportLab documentation refers to what sounds like a great reference example, but the problem is that the tarball I downloaded didn’t contain the files it was making reference to 🙁
I started out by investigating one of the small example projects in the “demo” directory of the ReportLab directory. It was called “gadflypaper” (Ironically, written by Aaron Watters. I worked in the cube outside of his office for several months last year — Hi Aaron!). Aaron’s example was very simple, and a great starting point to start understanding how to put together a very basic document. It’s not infested with abstractions — just a few simple functions, and a lot of text. I ripped out a lot of the text until I had just an example of each function in action, and then set to work.
The Basic Process
To simplify the work of doing page layout minutiae, I (like the example) used PLATYPUS, which is built into ReportLab and abstracts away some of the low-level layout details. If you *want* low-level control, however, you can do whatever you want with the pdfgen
module, also included (and PLATYPUS is basically a layer built from it).
With PLATYPUS, you get access to a bunch of prebuilt layout-related objects, representing things like paragraphs, tables, frames, and other things. You also have access to page templates, so that dealing with things like frame placement is a little easier.
So, to give you a rundown of the high-level steps:
- Choose a page template, and use it to create a document object.
- Create your “flowables” (paragraphs, charts, images, etc), and put them all into a list object. In ReportLab documentation, this is often referred to as a list named “story”
- Pass the list object to the build() method of the document object you created in step 1.
Phase 1: Let’s Get Something Working
As a first phase, let’s just make sure we can do the simplest of documents. Here’s some code that should work if you have a good installation of ReportLab (I’m using whatever was the latest version in early October, 2008.) Note that we’ll be cleaning this up and simplifying it as we go along.
#!/usr/bin/env python from reportlab.platypus import * from reportlab.lib.styles import getSampleStyleSheet from reportlab.rl_config import defaultPageSize from reportlab.lib.units import inch PAGE_HEIGHT=defaultPageSize[1] styles = getSampleStyleSheet() Title = Paragraph("Generating Reports with Python", styles["Heading1"]) Author = Paragraph("Brian K. Jones", styles["Normal"]) URL = Paragraph("https://protocolostomy.com", styles["Normal"]) email = Paragraph("bkjones +_at_+ gmail.com", styles["Normal"]) Abstract = Paragraph("""This is a simple example document that illustrates how to put together a basic PDF with a chart. I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab.""", styles["Normal"]) Elements = [Title, Author, URL, email, Abstract] def go(): doc = SimpleDocTemplate('gfe.pdf') doc.build(Elements) go()
Not a lot of actual code here. It’s mostly variable assignments. The variables are mostly just strings, but because I want to control how they’re arranged, I need to make them “Flowables”. Remember that PLATYPUS puts together a document by processing a list of Flowable objects and drawing them onto the document. So all of our strings are “Paragraph” objects. You’ll note, too, that Paragraph objects can be styled using definitions accessed from getSampleStyleSheet, which returns a ‘style object’. If you create one of these at the Python interpreter, and call the resulting object’s ‘list()’ function, you’ll see what styles are available, and you’ll also see what attributes each style has. Try running this code to make sure things work. Change the strings if you like 🙂
Phase 2: Simple Cleanup
I haven’t yet created insane layers of abstraction in my own code, because I’ve been working on deadlines and doing things that are relatively simple. This will inevitably change 🙂 However, there are some things you can do to make life a bit simpler and cleaner.
#!/usr/bin/env python from reportlab.platypus import * from reportlab.lib.styles import getSampleStyleSheet from reportlab.rl_config import defaultPageSize from reportlab.lib.units import inch PAGE_HEIGHT=defaultPageSize[1] styles = getSampleStyleSheet() Title = "Generating Reports with Python" Author = "Brian K. Jones" URL = "https://protocolostomy.com" email = "bkjones@gmail.com" Abstract = """This is a simple example document that illustrates how to put together a basic PDF with a chart. I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab.""" Elements=[] HeaderStyle = styles["Heading1"] ParaStyle = styles["Normal"] PreStyle = styles["Code"] def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3): s = Spacer(0.2*inch, sep*inch) Elements.append(s) para = klass(txt, style) Elements.append(para) def p(txt): return header(txt, style=ParaStyle, sep=0.1) def go(): doc = SimpleDocTemplate('gfe.pdf') doc.build(Elements) header(Title) header(Author, sep=0.1, style=ParaStyle) header(URL, sep=0.1, style=ParaStyle) header(email, sep=0.1, style=ParaStyle) header("ABSTRACT") p(Abstract) go()
So, this is still simple. Simplistic, even. All I did was move the repetitive bits to functions. The ‘header’ and ‘p’ functions are (for now) unaltered from the gadflypaper demo. The good part here is that strings can be defined as ‘just strings’. Paragraphs and headers are just plain old string variables, and then at the bottom I just call the ‘header’ and ‘p’ functions and pass in the variables. The order in which I call the functions determines the order my document will appear in.
Phase 3
There’s kind of an issue with the way these functions work, at least for my needs. The problem is that they just go ahead and add things to the “Elements” list automagically. This might be ok for some quick and dirty tasks, but in my case I found that I needed more control. Things were crossing page boundaries where I didn’t want them to, and if I want to add formatting or apply built-in functionality, I can’t do it on a per-object basis without loading up the argument list.
I also wanted to have a relatively easy way to move *sections* of reports around, where a section might consist of a heading, a paragraph, and a source code listing — three different “Flowable” objects. So I altered these functions to make them return flowables instead of just adding things to the Elements list for me:
#!/usr/bin/env python from reportlab.platypus import * from reportlab.lib.styles import getSampleStyleSheet from reportlab.rl_config import defaultPageSize from reportlab.lib.units import inch PAGE_HEIGHT=defaultPageSize[1] styles = getSampleStyleSheet() Title = "Generating Reports with Python" Author = "Brian K. Jones" URL = "https://protocolostomy.com" email = "bkjones@gmail.com" Abstract = """This is a simple example document that illustrates how to put together a basic PDF with a chart. I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab.""" Elements=[] HeaderStyle = styles["Heading1"] ParaStyle = styles["Normal"] PreStyle = styles["Code"] def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3): s = Spacer(0.2*inch, sep*inch) para = klass(txt, style) sect = [s, para] result = KeepTogether(sect) return result def p(txt): return header(txt, style=ParaStyle, sep=0.1) def pre(txt): s = Spacer(0.1*inch, 0.1*inch) p = Preformatted(txt, PreStyle) precomps = [s,p] result = KeepTogether(precomps) return result def go(): doc = SimpleDocTemplate('gfe.pdf') doc.build(Elements) mytitle = header(Title) myname = header(Author, sep=0.1, style=ParaStyle) mysite = header(URL, sep=0.1, style=ParaStyle) mymail = header(email, sep=0.1, style=ParaStyle) abstract_title = header("ABSTRACT") myabstract = p(Abstract) head_info = [mytitle, myname, mysite, mymail, abstract_title, myabstract] Elements.extend(head_info) code_title = header("Basic code to produce output") code_explain = p("""This is a snippet of code. It's an example using the Preformatted flowable object, which makes it easy to put code into your documents. Enjoy!""") code_source = pre(""" def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3): s = Spacer(0.2*inch, sep*inch) para = klass(txt, style) sect = [s, para] result = KeepTogether(sect) return result def p(txt): return header(txt, style=ParaStyle, sep=0.1) def pre(txt): s = Spacer(0.1*inch, 0.1*inch) p = Preformatted(txt, PreStyle) precomps = [s,p] result = KeepTogether(precomps) return result def go(): doc = SimpleDocTemplate('gfe.pdf') doc.build(Elements) """) codesection = [code_title, code_explain, code_source] src = KeepTogether(codesection) Elements.append(src) go()
So, this isn’t too bad. It’s still functional programming. I’ll revamp it in another post to use objects, but for those readers who are still learning all of this, it might help to leave out the abstraction for now. What I liked about the gadflypaper demo was that it was quick and dirty. You could read it line by line, top to bottom, and understand what just happened without jumping back and forth between main() code and object code.
As you can see, I’m using the KeepTogether() method, in two different ways. In the functions, I use it so I don’t have to go back later and manually add spacer elements to the Elements array. Then, toward the bottom, I create a preformatted code snippet, and I use the KeepTogether method to make sure that all parts in the code section stay together without flowing across a page boundary. There are other options you can use to customize how your document deals with ‘orphan’ and ‘widow’ elements as well, so definitely check out the documentation for that (or keep reading this blog. i’ll get to it eventually).
So what’s left?
Phase 4: The Grand Finale
The rest of the code I add is to connect to a database, make a query, and then pass the data returned from the database to a function that creates a chart. I add the chart to the Elements, and we’re in business!
#!/usr/bin/env python import MySQLdb import sys import string from reportlab.graphics.shapes import Drawing from reportlab.graphics.charts.linecharts import HorizontalLineChart from reportlab.platypus import * from reportlab.lib.styles import getSampleStyleSheet from reportlab.rl_config import defaultPageSize from reportlab.lib.units import inch dbhost = 'localhost' dbname = 'httplog' dbuser = 'jonesy' dbpasswd = 'mypassword' PAGE_HEIGHT=defaultPageSize[1] styles = getSampleStyleSheet() Title = "Generating Reports with Python" Author = "Brian K. Jones" URL = "https://protocolostomy.com" email = "bkjones@gmail.com" Abstract = """This is a simple example document that illustrates how to put together a basic PDF with a chart. I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab.""" Elements=[] HeaderStyle = styles["Heading1"] ParaStyle = styles["Normal"] PreStyle = styles["Code"] def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3): s = Spacer(0.2*inch, sep*inch) para = klass(txt, style) sect = [s, para] result = KeepTogether(sect) return result def p(txt): return header(txt, style=ParaStyle, sep=0.1) def pre(txt): s = Spacer(0.1*inch, 0.1*inch) p = Preformatted(txt, PreStyle) precomps = [s,p] result = KeepTogether(precomps) return result def connect(): try: conn1 = MySQLdb.connect(host = dbhost, user = dbuser, passwd = dbpasswd, db = dbname) return conn1 except MySQLdb.Error, e: print "Error %d: %s" % (e.args[0], e.args[1]) sys.exit (1) def getcursor(conn): cursor = conn.cursor() return cursor def totalevents_hourly(rcursor): rcursor.execute("""select hour, count(*) as hits from hits group by hour;""") return rcursor def graphout(catnames, data): drawing = Drawing(400, 200) lc = HorizontalLineChart() lc.x = 30 lc.y = 50 lc.height = 125 lc.width = 350 lc.data = data catNames = catnames lc.categoryAxis.categoryNames = catNames lc.categoryAxis.labels.boxAnchor = 'n' lc.valueAxis.valueMin = 0 lc.valueAxis.valueMax = 1500 lc.valueAxis.valueStep = 300 lc.lines[0].strokeWidth = 2 lc.lines[0].symbol = makeMarker('FilledCircle') # added to make filled circles. lc.lines[1].strokeWidth = 1.5 drawing.add(lc) return drawing def go(): doc = SimpleDocTemplate('gfe.pdf') doc.build(Elements) mytitle = header(Title) myname = header(Author, sep=0.1, style=ParaStyle) mysite = header(URL, sep=0.1, style=ParaStyle) mymail = header(email, sep=0.1, style=ParaStyle) abstract_title = header("ABSTRACT") myabstract = p(Abstract) head_info = [mytitle, myname, mysite, mymail, abstract_title, myabstract] Elements.extend(head_info) code_title = header("Basic code to produce output") code_explain = p("""This is a snippet of code. It's an example using the Preformatted flowable object, which makes it easy to put code into your documents. Enjoy!""") code_source = pre(""" def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3): s = Spacer(0.2*inch, sep*inch) para = klass(txt, style) sect = [s, para] result = KeepTogether(sect) return result def p(txt): return header(txt, style=ParaStyle, sep=0.1) def pre(txt): s = Spacer(0.1*inch, 0.1*inch) p = Preformatted(txt, PreStyle) precomps = [s,p] result = KeepTogether(precomps) return result def go(): doc = SimpleDocTemplate('gfe.pdf') doc.build(Elements) """) codesection = [code_title, code_explain, code_source] src = KeepTogether(codesection) Elements.append(src) hourly_title = header("Hits logged, per hour") hourly_explain = p("""This shows aggregate hits across a 24-hour period. """) conn = connect() cur = getcursor(conn) te_hourly = totalevents_hourly(cur) catnames = [] data = [] values = [] for row in te_hourly: catnames.append(str(row[0])) values.append(row[1]) data.append(values) hourly_chart = graphout(catnames, data) hourly_section = [hourly_title, hourly_explain, hourly_chart] Elements.extend(hourly_section) go()
So, I’ve muddied things up a bit. If you’ve written database code before, you can just look past it all. I don’t do anything magical there. In fact, the chart creation isn’t magical either. I’m sure there’s even a cleaner way to do it – but this works for the moment.
I get a connection object, use it to get a cursor, then pass the cursor to the query function, which passes back…. a query object: te_hourly. The chart I’m going to create needs ‘category’ names for the y-axis values, and then values to plot on the chart. In my case, the hour is row[0] and the total hits for that hour are in row[1]. I build my catnames and data lists, and then create “hourly_chart” by passing my lists to the graphout function. Finally, I add the chart, along with its title and explanation to the Elements list. Done!
For its part, the graphout function is mostly just a bunch of parameters I need to configure my HorizontalLineChart object. Once the chart is all set to go, I need to add it onto my Drawing object, and return the Drawing flowable object.
Not yet what I’d call “Beautiful Code”, but it works, and it’s likely to help some other folks wade through the ‘getting started’ hump with ReportLab. Hope it was useful.