What is this module for?
Interacting with a PostgreSQL database in Python.
What is PostgreSQL?
PostgreSQL is an open source relational database product. It has some more advanced features, like built-in networking-related and GIS-related datatypes, the ability to script stored functions in multiple languages (including Python), etc. If you have never heard of PostgreSQL, get out from under your rock!
Making Contact
Using the pscyopg2 module to connect to a PostgreSQL database couldn’t be simpler. You can use the connect() method of the module, passing in either the individual arguments required to make contact (dbname, user, etc), or you can pass them in as one long “DSN” string, like this:
dsn = "host=localhost port=6000 dbname=testdb user=jonesy" conn = psycopg2.connect(dsn) conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
The DSN value is a space-delimited collection of key=value pairs, which I construct before sending the dsn to the psycopg2.connect() method. Once we have a connection object, the very first thing I do is set the connection’s isolation level to ‘autocommit’, so that INSERT and UPDATE transactions are committed automatically without my having to call conn.commit() after each transaction. There are several isolation levels defined in the psycopg2.extensions package, and they’re defined in ‘extensions’ because they go beyond what is defined in the DB API 2.0 spec that is typically used as a reference in creating Python database modules.
Simple Queries and Type Conversion
In order to get anything out of the database, we have to know how to talk to it. Of course this means writing some SQL, but it also means sending query arguments in a format understood by the database. I’m happy to report that psycopg2 does a pretty good job of making things “just work” when it comes to converting your input into PostgreSQL types, and converting the output directly into Python types for easy manipulation in your code. That said, understanding how to properly use these features can be a bit confusing at first, so let me address the source of a lot of early confusion right away:
cur = conn.cursor() cur.execute("""SELECT id, fname, lname, balance FROM accounts WHERE balance > %s""", min_balance)
Chances are, min_balance is an integer, but we’re using ‘%s’ anyway. Why? Because this isn’t really you telling Python to do a string formatting operation, it’s you telling psycopg2 to convert the incoming data using the default psycopg2 method, which converts integers into the PostgreSQL INT type. So, you can use “%s” in the ‘execute()’ method to properly convert integers, strings, dates, datetimes, timedeltas, lists, tuples and most other native Python types to a corresponding PostgreSQL type. There are adapters built into psycopg2 as well if you need more control over the type conversion process.
Cursors
Psycopg2 makes it pretty easy to get your results back in a format that is easy for the receiving code to deal with. For example, the projects I work on tend to use the RealDictCursor type, because the code tends to require accessing the parts of the resultset rows by name rather than by index (or just via blind looping). Here’s how to set up and use a RealDictCursor:
curs = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) curs.execute("SELECT id, name FROM users") rs = curs.fetchall() for row in rs: print rs['id'], rs['name']
It’s possible you have two sections of code that’ll rip apart a result set, and one needs by-name access, and the other just wants to loop blindly or access by index number. If that’s the case, just replace ‘RealDictCursor’ with ‘DictCursor’, and you can have it both ways!
Another nice thing about psycopg2 is the cursor.query attribute and cursor.mogrify method. Mogrify allows you to test and see how a query will look after all input variables are bound, but before the query is sent to the server. Cursor.query prints out the exact query that was actually sent over the wire. I use cursor.query in my logging output all the time to catch out-of-order parameters and mismatched input types, etc. Here’s an example:
try: curs.callproc('myschema.myprocedure', callproc_params) except Exception as out: print out print curs.query
Calling Stored Functions
Stored procedures or ‘functions’ in PostgreSQL-speak can be immensely useful in large complex applications where you want to enforce business rules in a single place outside the domain of the main application developers. It can also in some cases be more efficient to put functionality in the database than in the main application code. In addition, if you’re hiring developers, they should develop in the standard language for your environment, not SQL: SQL should be written by database administrators and developers, and exposed to the developers as needed, so all the developers have to do is call this newly-exposed function. Here’s how to call a function using psycopg2:
callproc_params = [uname, fname, lname, uid] cur.callproc("myschema.myproc", callproc_params)
The first argument to ‘callproc()’ is the name of the stored procedure, and the second argument is a sequence holding the input parameters to the function. The input parameters should be in the order that the stored procedure expects them, and I’ve found after quite a bit of usage that the module typically is able to convert the types perfectly well without my intervention, with one exception…
The UUID Array
PostgreSQL has built-in support for lots of interesting data types, like INET types for supporting IP addresses and CIDR network blocks, and GIS-related data types. In addition, PostgreSQL supports a type that is an array of UUIDs. This comes in handy if you use a UUID to identify items and want to store an array of them to associate with an order, or you use UUIDs to track messages and want to store an array of them together to represent a message thread or conversation. To get a UUID array into the database quickly and easily, it’s really not too difficult. If you have a list of strings that are UUID strings, you can do a quick conversion, call one function, and then use the array like any other input parameter:
my_uuid_arr = [uuid.UUID(i) for i in my_uuid_arr] psycopg2.extras.register_uuid() callproc_params = [ myvar1, myvar2, my_uuid_arr ] curs.callproc('myschema.myproc', callproc_params)
Connection Status
It’s not a given that your database connection lives on from query to query, and you shouldn’t really just assume that because you did a query a fraction of a second ago that it’s still around now. Actually, to speak about things more Pythonically, you *should* assume the connection is still there, but be ready for failure, and check the connection status to diagnose and help get things back on track. You can check the ‘status’ attribute of your connection object. Here’s one way you might do it:
@property def active_dbconn(self): return self.conn.status in [psycopg2.extensions.STATUS_READY, psycopg2.extensions.STATUS_BEGIN]:
So, I’m assuming here that you have some object that has a connection object that it refers to as ‘self.connection’. This one-liner function uses the @property built-in Python decorator, so the other methods in the class can either check the connection status before attempting a query:
if self.active_dbconn: try: curs.execute(...) except Exception as out: logging.error("Houston we have a problem")
Or you can flip that around like this:
try: curs.execute(...) except Exception as out: if not self.active_dbconn: logging.error("Execution failed because your connection is dead") else: logging.error("Execution failed in spite of live connection: %s" % out)
Read On…
A database is a large, complex beast. There’s no way to cover the entirety of a database or a module that talks to it in a simple blog post, but I hope I’ve been able to show some of the more common features, and maybe one or two other items of interest. If you want to know more, I’m happy to report that, after a LONG time of being unmaintained, the project has recently sprung back to life and is pretty well-documented these days. Check it out!