Skip to content

Musings of an Anonymous Geek

Made with only the finest 1's and 0's

Menu
  • About
  • Search Results
Menu

Python, PostgreSQL, and psycopg2’s Dusty Corners

Posted on December 1, 2009 by bkjones

Last time I wrote code with psycopg2 was around 2006, but I was reacquainted with it over the past couple of weeks, and I wanted to make some notes on a couple of features that are not well documented, imho. Portions of this post have been snipped from mailing list threads I was involved in.

Calling PostgreSQL Functions with psycopg2

So you need to call a function. Me too. I had to call a function called ‘myapp.new_user’. It expects a bunch of input arguments. Here’s my first shot after misreading some piece of some example code somewhere:

qdict = {'fname': self.fname, 'lname': self.lname, 'dob': self.dob, 'city': self.city, 'state': self.state, 'zip': self.zipcode}

sqlcall = """SELECT * FROM myapp.new_user( %(fname)s, %(lname)s,
%(dob)s, %(city)s, %(state)s, %(zip)s""" % qdict

curs.execute(sqlcall)

There’s no reason this should work, or that anyone should expect it to work. I just wanted to include it in case someone else made the same mistake. Sure, the proper arguments are put in their proper places in ‘sqlcall’, but they’re not quoted at all.

Of course, I foolishly tried going back and putting quotes around all of those named string formatting arguments, and of course that fails when you have something like a quoted “NULL” trying to move into a date column. It has other issues too, like being error-prone and a PITA, but hey, it was pre-coffee time.

What’s needed is a solution whereby psycopg2 takes care of the formatting for us, so that strings become strings, NULLs are passed in a way that PostgreSQL recognizes them, dates are passed in the proper format, and all that jazz.

My next attempt looked like this:

curs.execute("""SELECT * FROM myapp.new_user( %(fname)s, %(lname)s,
%(dob)s, %(city)s, %(state)s, %(zip)s""", qdict)

This is, according to some articles, blog posts, and at least one reply on the psycopg mailing list “the right way” to call a function using psycopg2 with PostgreSQL. I’m here to tell you that this is not correct to the best of my knowledge.The only real difference between this attempt and the last is I’ve replaced the “%” with a comma, which turns what *was* a string formatting operation into a proper SELECT with a psycopg2-recognized parameter list. I thought this would get psycopg2 to “just work”, but no such luck. I still had some quoting issues.

I have no idea where I read this little tidbit about psycopg2 being able to convert between Python and PostgreSQL data types, but I did. Right around the same time I was thinking “it’s goofy to issue a SELECT to call a function that doesn’t really want to SELECT anything. Can’t callproc() do this?” Turns out callproc() is really the right way to do this (where “right” is defined by the DB-API which is the spec for writing a Python database module). Also turns out that psycopg2 can and will do the type conversions. Properly, even (in my experience so far).

So here’s what I got to work:

callproc_params = [self.fname, self.lname, self.dob, self.city, self.state, self.zipcode]

curs.callproc('myapp.new_user', callproc_params)

This is great! Zero manual quoting or string formatting at all! And no “SELECT”. Just call the procedure and pass the parameters. The only thing I had to change in my code was to make my ‘self.dob’ into a datetime.date() object, but that’s super easy, and after that psycopg2 takes care of the type conversion from a Python date to a PostgreSQL date. Tomorrow I’m actually going to try calling callproc() with a list object inside the second argument. Wish me luck!

A quick cursor gotcha

I made a really goofy mistake. At the root of it, what I did was share a connection *and a cursor object* among all methods of a class I created to abstract database operations out of my code. So, I did something like this (this is not the exact code, and it’s untested. Treat it like pseudocode):

class MyData(object):
   def __init__(self, dsn): 
      self.conn = psycopg2.Connection(dsn)
      self.cursor = self.conn.cursor()

   def get_users_by_regdate(self, regdate, limit):
      self.cursor.arraysize = limit 
      self.cursor.callproc('myapp.uid_by_regdate', regdate)
      while True: 
         result = self.cursor.fetchmany()
         if not result:
            break 
         yield result

   def user_is_subscribed(self, uid): 
      self.cursor.callproc('myapp.uid_subscribed', uid)
      result = self.cursor.fetchone()
      val = result[0]
      return val

Now, in the code that uses this class, I want to grab all of the users registered on a given date, and see if they’re subscribed to, say, a mailing list, an RSS feed, a service, or whatever. See if you can predict the issue I had when I executed this:

    db = MyData(dsn)
    for id in db.get_users_by_regdate([joindate]):
        idcount += 1
        print idcount
        param = [id]
        if db.user_is_subscribed(param):
            print "User subscribed"
            skip_count += 1
            continue
        else:
            print "Not good"
            continue

Note that the above is test code. I don’t actually want to continue to the top of the loop regardless of what happens in production 🙂

So what I found happening is that, if I just commented out the portion of the code that makes a database call *inside* the for loop, I could print ‘idcount’ all the way up to thousands of results (however many results there were). But if I left it in, only 100 results made it to ‘db.user_is_subscribed’.

Hey, ‘100’ is what I’d set the curs.arraysize() to! Hey, I’m using the *same cursor* to make both calls! And with the for loop, the cursor is being called upon to produce one recordset while it’s still trying to produce the first recordset!

Tom Roberts, on the psycopg list, states the issue concisely:

The cursor is stateful; it only contains information about the last
query that was executed. On your first call to “fetchmany”, you fetch a
block of results from the original query, and cache them. Then,
db.user_is_subscribed calls “execute” again. The cursor now throws away all
of the information about your first query, and fetches a new set of
results. Presumably, user_is_subscribed then consumes that dataset and
returns. Now, the cursor is position at end of results. The rows you
cached get returned by your iterator, then you call fetchmany again, but
there’s nothing left to fetch…

…So, the lesson is if you need a new recordset, you create a new cursor.

Lesson learned. I still think it’d be nice if psycopg2 had more/better docs, though.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Facebook (Opens in new window) Facebook

Recent Posts

  • Auditing Your Data Migration To ClickHouse Using ClickHouse Local
  • ClickHouse Cheat Sheet 2024
  • User Activation With Django and Djoser
  • Python Selenium Webdriver Notes
  • On Keeping A Journal and Journaling
  • What Geeks Could Learn From Working In Restaurants
  • What I’ve Been Up To
  • PyCon Talk Proposals: All You Need to Know And More
  • Sending Alerts With Graphite Graphs From Nagios
  • The Python User Group in Princeton (PUG-IP): 6 months in

Categories

  • Apple
  • Big Ideas
  • Books
  • CodeKata
  • Database
  • Django
  • Freelancing
  • Hacks
  • journaling
  • Leadership
  • Linux
  • LinuxLaboratory
  • Loghetti
  • Me stuff
  • Other Cool Blogs
  • PHP
  • Productivity
  • Python
  • PyTPMOTW
  • Ruby
  • Scripting
  • Sysadmin
  • Technology
  • Testing
  • Uncategorized
  • Web Services
  • Woodworking

Archives

  • January 2024
  • May 2021
  • December 2020
  • January 2014
  • September 2012
  • August 2012
  • February 2012
  • November 2011
  • October 2011
  • June 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • September 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
© 2025 Musings of an Anonymous Geek | Powered by Minimalist Blog WordPress Theme