Skip to content

Musings of an Anonymous Geek

Made with only the finest 1's and 0's

Menu
  • About
  • Search Results
Menu

Python Packaging, Distribution, and Deployment: Volume 1

Posted on December 1, 2010December 1, 2010 by bkjones

This is just Volume 1. I’ll cover as much as I can and just stop when it gets so long most people will stop reading 🙂

I’ve been getting to know the Python packaging and distribution landscape way better than I ever wanted to over the last couple of weeks. After 2 or 3 weeks now, I’m saddened to report that I still find it quite painful, and not a little bit murky. At least it gets clearer and not murkier as I go on.

I’m a Senior Operations Developer at myYearbook.com, which produces a good bit of Python code (and that ‘good bit’ is getting bigger by the day). We also open source lots of stuff (also growing, and not all Python). I’m researching all of this so that when we develop new internal modules, they’re easy to test and deploy on hundreds of machines, and when we decide to open source that module, it’s not painful for us to do that, and we can distribute the packages in a way that is intuitive for other users without jumping through hoops because “we do it different”. One process and pipeline to rule them all, as it were.

There are various components involved in building an internal packaging and distribution standard for Python code. Continuous integration, automated testing, and automated deployment (maybe someday “continuous deployment”) are additional considerations. This is a more difficult problem than I feel it should be, but it’s a pretty interesting one, and I’ll chronicle the adventure here as I have time. Again, this is just Volume 1.

Part 1: Packaging, Distribution, and Deployment

Let’s define the terms. By ‘packaging’ I mean assembling some kind of singular file object containing a Python project, including its source code, data files, and anything it needs in order to be installed. To be clear, this would include something like a setup.py file perhaps, and it would not include external dependencies like a NoSQL server or something.

By ‘distribution’, I mean ‘how the heck do you get this beast pushed out to a couple hundred machines or more?’

Those two components are necessary but not sufficient to constitute a ‘deployment’, which I think encompasses the last two terms, but also accounts for things like upgrades, rollbacks, performing start/stop/restarts, running unit tests after it gets to its final destination but before it kills off the version that is currently running happily, and other things that I think make for a reliable, robust application environment.

With definitions out of the way, let’s dive into the fun stuff.

Part 2: Interdependencies

Some knowledge of packaging in Python is helpful when you go to discuss distribution and deployment. The same goes for the other two components. When you start out looking into the various technologies involved, at some point you’re going to look down and notice that pieces of your brain appear to have dropped right out of your ears. Then you’ll reach to pull out your hair only to realize that your brain hasn’t fallen out your ears: it has, in fact, exploded.

If you’re not careful, you’ll find yourself thinking things like ‘if I don’t have a packaging format, how can I know my distribution/deployment method? If I don’t have a deployment method, how do I know what package format to use?’ It’s true that you can run into trouble if you don’t consider the interplay between these components, but it’s also true that the Python landscape isn’t really all that treacherous compared to other jungles I’ve had to survive in.

I believe the key is to just take baby steps. Start simple. Keep the big picture in mind, but decide early to not let the perfect be the enemy of the good. When I started looking into this, I wanted an all-singing all-dancing, fully-automated end-to-end, “developer desktop to production” deployment program that worked at least as well as those hydraulic vacuum thingies at the local bank drive-up windows. I’ll get there, too, but taking baby steps probably means winding up with a better system in the end, in part because it takes some time and experience to even identify the moving parts that need greasing.

So, be aware that it’s possible to get yourself in trouble by racing ahead with, say, eggs as a package format if you plan to use pip to do installation, or if you want to use Fabric for distribution of tarballs but are in a Windows shop with no SSH servers or tarball extractors.

Part 3: Package Formats

  • tar.gz (tarballs)
  • zip
  • egg
  • rpm/deb/ebuild/<os-specific format here>
  • None

If you choose not to decide, you still have made a choice…

Don’t forget that picking a package format also includes the option to not use a package format. I’ve worked on projects of some size that treated a centralized version control system as a central distribution point as well. They’d manually log into a server, do an svn checkout (back when svn was cool and stuff), test it, and if all was well, they’d flip a symlink to point at the newly checked out code and restart. Deployment was not particularly automated (though it could’ve been), but some aspects of the process were surprisingly good, namely:

  • They ran a surprisingly complete set of tests on every package, on the system it was to be deployed on, without interrupting the currently running service. As a result, they had a high level of confidence that all dependencies were met, the code acted predictably, and the code was ‘fit for purpose’ to the extent that you can know these things from running the available tests.
  • Backing out was Mind-Numbingly Easyâ„¢ – since moving from one version to the next consisted of changing where a symlink pointed to and restarting the service, backing out meant doing the exact same thing in reverse: point the symlink back at the old directory, and restart.

I would not call that setup “bad”, given the solutions I’ve seen. It just wasn’t automated at all to speak of. It beats to hell what I call the “Pull & Pray” deployment scenario, in which your running service is a VCS checkout, and you manually log in, cd to that directory, do a ‘pull’ or ‘update’ or whatever the command does an in-place update of the code in your VCS, and then just restarting the service. That method is used in an astonishingly large number of projects I’ve worked on in the past. Zero automation, zero testing, and any confidence you might find in a solution like that is, dare I say, hubris.

Python Eggs

I don’t really want to entertain using eggs and the reasoning involves understanding some recent history in the Python packaging and distribution landscape. I’ll try to be brief. If you decide to look deeper, here’s a great post to use as a starting point in your travels.

distutils is built into Python. You create a setup.py file, you run ‘python setup.py install’, and distutils takes over and does what setup.py tells it to. That is all.

Setuptools was a response to features a lot of people wanted in distutils. It’s built atop distutils as a collection of extensions that add these features. Included with setuptools is the easy_install command, which will automatically install egg files.

Setuptools hasn’t been regularly and actively maintained in a year or more, and that got old really fast with developers and other downstream parties, so some folks forked it and created ‘distribute‘, which is setuptools with all of the lingering commits applied that the setuptools maintainer never applied. They also have big plans for distribute going forward. One is to do away with easy_install in favor of pip.

pip, at time of writing, cannot install egg files.

So, in a nutshell, I’m not going to lock myself to setuptools by using eggs, and I’m not going down the road of manually dealing with eggs, and pip doesn’t yet ‘do’ eggs, so in my mind, the whole idea of eggs being a widely-used and widely-supported format is in question, and I’m just not going there.

If I’m way out in left field on any of that, please do let me know.

Tarballs

The Old Faithful of package formats is the venerable tarball. A 30-year-old file format compressed using a 20-year-old compression tool. It still works.

Tarball distribution is dead simple: you create your project, put a setup.py file inside, create a tarball of the project, and put it up on a web server somewhere. You can point easy_install or pip at the URL to the tarball, and either tool will happily grab it, unpack it, and run ‘python setup.py install’ on it. In addition, users can also easily inspect the contents of the file without pip or easy_install using standard tools, and wide availability of those tools also makes it easy to unpack and install manually if pip or easy_install aren’t available.

Python has a tar module as well, so if you wanted to bypass every other available tool you could easily use Python itself to script a solution that packages your project and uploads it with no external Python module dependencies.

Zip Files

I won’t go into great detail here because I haven’t used zip files on any regular basis in years, but I believe that just about everything I said about tarballs is true for zip files. Python has a zipfile module, zip utilities are widely available, etc. It works.

Distro/OS-specific Package Formats

I’ve put this on the ‘maybe someday’ list. It’d be great if, in addition to specifying Python dependencies, your package installer could also be aware of system-level dependencies that have nothing to do with Python, except that your app requires them. 🙂

So, if your application is a web app but requires a local memcache instance, RPM can manage that dependency.

I’m not a fan of actually building these things though, and I don’t know many people who are. Sysadmins spend their entire early careerhood praying they’ll never have to debug an RPM spec file, and if they don’t, they should.

That said, the integration of the operations team into the deployment process is, I think, a good thing, and leveraging the tools they already use to manage packages to also manage your application is a win for everyone involved. Sysadmins feel more comfortable with the application because they’re far more comfortable with the tools involved in installing it, and developers are happy because instead of opening five tickets to get all of the system dependencies right, they can hand over the RPM or deb and have the tool deal with those dependencies, or at least have the tool tell the sysadmin “can’t install, you need x, y, and z”, etc.

Even if I went this route someday, I can easily foresee keeping tarballs around, at least to keep developers from having to deal with package managers if they don’t want/need to, or can’t. In the meantime, missing system-level dependencies can be caught when the tests are run on the machine being deployed to.

Let Me Know Your Thoughts

So that’s it for Volume 1. Let me know your thoughts and experiences with different packaging formats, distribution, deployment, whatever. I expect that, as usual in the Python community, the comments on the blog post will be as good as or better than the post itself. 🙂

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Facebook (Opens in new window) Facebook

Recent Posts

  • Auditing Your Data Migration To ClickHouse Using ClickHouse Local
  • ClickHouse Cheat Sheet 2024
  • User Activation With Django and Djoser
  • Python Selenium Webdriver Notes
  • On Keeping A Journal and Journaling
  • What Geeks Could Learn From Working In Restaurants
  • What I’ve Been Up To
  • PyCon Talk Proposals: All You Need to Know And More
  • Sending Alerts With Graphite Graphs From Nagios
  • The Python User Group in Princeton (PUG-IP): 6 months in

Categories

  • Apple
  • Big Ideas
  • Books
  • CodeKata
  • Database
  • Django
  • Freelancing
  • Hacks
  • journaling
  • Leadership
  • Linux
  • LinuxLaboratory
  • Loghetti
  • Me stuff
  • Other Cool Blogs
  • PHP
  • Productivity
  • Python
  • PyTPMOTW
  • Ruby
  • Scripting
  • Sysadmin
  • Technology
  • Testing
  • Uncategorized
  • Web Services
  • Woodworking

Archives

  • January 2024
  • May 2021
  • December 2020
  • January 2014
  • September 2012
  • August 2012
  • February 2012
  • November 2011
  • October 2011
  • June 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • September 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
© 2025 Musings of an Anonymous Geek | Powered by Minimalist Blog WordPress Theme