Skip to content

Musings of an Anonymous Geek

Made with only the finest 1's and 0's

Menu
  • About
  • Search Results
Menu

PyTPMOTW: PyYAML

Posted on April 12, 2010April 13, 2010 by bkjones

What’s This Module For?

Reading and writing files formatted using “YAML Ain’t Markup Language”” (YAML), and converting YAML syntax into native Python objects and datatypes.

What is YAML?

According to the website which houses the YAML Specification:

YAML™ (rhymes with “camel”) is a human-friendly, cross language, Unicode
based data serialization language designed around the common native data
structures of agile programming languages. It is broadly useful for
programming needs ranging from configuration files to Internet messaging to
object persistence to data auditing.

My introduction to YAML came several years ago in the context of messaging, and I then had a run-in with YAML as a logging format (actually, I was trying to parse a MySQL slow query log by coaxing it into YAML format). However, when I started writing Python full time, working on several different initiatives, YAML quickly became the standard configuration format.

Why? Simplicity. Using YAML for our config files and PyYAML to parse them, any developer can figure out what’s happening in our application in a matter of minutes, even if Python is not their primary language. It’s also nice that the YAML syntax is parsed into native Python datatypes, so Python coders looking at a config file can start to get a pretty good picture of how the program basically works.

The other thing that makes it simpler than some other config-specific options is that there’s not a lot of underlying “stuff” to know about. YAML isn’t a configuration engine, it’s essentially just a way to deal with data structures without locking the format to a specific language.

I also happen to like that it’s not config-specific, because it means that if I later need a messaging format, I already know one, and am familiar with a certain Python module to work with it!

Basic Usage

Let’s write a very simple YAML configuration for the logging portion of anapplication:

%YAML 1.2
---
Logging:
format: "%(levelname) -10s %(asctime)s %(module)s:%(funcName)s()  %(message)s"
level: 10
...

I’ve put logging-related configuration in its own “section” (really data structure) here so when I want to configure other things in the application I can do so without shooting myself in the foot and having to be careful not to use the same key names, etc.

I’ve stored this configuration in a file called ‘log.conf’. From there you can easily play with it in an interpreter session:

>>> import yaml
>>> config_file = open('log.conf', 'r')
>>> config = yaml.load(config_file)
>>> config
{'Logging': {'format': '%(levelname) -10s %(asctime)s %(module)s:%(funcName)s()  %(message)s', 'level': 10}}
>>>

With the configuration out of the way, let’s look at the code that would use it:

#!/usr/bin/env python

import logging
import yaml

def doit(uid):
    logging.debug("Working with uid: %s" % uid)

if __name__ == "__main__":
    config_file = open('log.conf', 'r')
    config = yaml.load(config_file)
    config_file.close()
    logging.basicConfig(**config['Logging'])

    doit(22222)

logging.basicConfig() takes a keyword dictionary of optional configuration items. Here I’m just using the ‘format’ and ‘level’ options, but there are more.

The only thing I do inside the doit() function is use logging to output the value of ‘uid’ passed in. This is really a test that the format I’ve configured is actually being used.

The format is fairly intuitive: indentation defines a block, just like in Python. The ‘—‘ and ‘…’ lines denote the beginning and end of the YAML document. You can have several documents in a file if you so choose. This might be done if you’re storing a feed or email threads in YAML format.

Type Conversion

Type conversion to the built in Python primitives works very well and is very intuitive in my experience. The above would be parsed as a string for the ‘format’ key, and an ‘int’ for the ‘level’ key. The entire block above will become a dictionary, and there is YAML syntax you can use to create lists and lists of lists, etc., as well.

For example, let’s say I’m creating a Django-like web application framework and I’ve decided to store my URL-to-handler mappings in a YAML file. You could easily do it with a list of lists, which looks like this in YAML:

RequestHandlers:
- [/, framework.handlers.RootHandler]
- [/signup, framework.handlers.RegisterNow]
- [/login, framework.handlers.Login]
- [/faq, framework.handlers.FAQ]

This will form a list of lists that you can work with in your code that looks like this in the config dictionary:

{'RequestHandlers': [['/', 'framework.handlers.RootHandler'], ['/signup',
'framework.handlers.RegisterNow'], ['/login', 'framework.handlers.Login'],
['/faq', 'framework.handlers.FAQ']]}

If for some reason type conversion doesn’t work as you expect, or you need to represent, say, a boolean using a string like “y” or “Yes” instead of “True”, you can explicitly tag your value using tags defined in the YAML specification for this very purpose. Here’s how you’d explicitly tag “Yes” as a boolean, to insure it’s not parsed as a string:

verbose: !!bool "Yes"

When this is parsed by PyYAML, it will be a Python boolean, and the value when printed to the screen will be ‘True’ (without quotes). There are several other explicit type tags, including ‘!!int’, ‘!!float’, ‘!!null’, ‘!!timestamp’ and more.

If you like, you could alter our URL mapper from above and create a list of tuples. Note the use of the !!omap tag, which is short for ‘ordered mapping’:

RequestHandlers: !!omap
- /: framework.handlers.RootHandler
- /signup: framework.handlers.RegisterNow
- /login: framework.handlers.Login
- /faq: framework.handlers.FAQ

The resulting config dictionary looks like this:

{'RequestHandlers': [('/', 'framework.handlers.RootHandler'), ('/signup',
'framework.handlers.RegisterNow'), ('/login', 'framework.handlers.Login'),
('/faq', 'framework.handlers.FAQ')]}

More than once I’ve gone back to my YAML configuration to alter the type of data structure returned to better suit the code that uses it. It’s pretty convenient, and making the changes to both the configuration file and the code are typically easy enough to be considered a non-event.

Beyond Basic Data Types

The ‘level’ option in logging.basicConfig can be specified either as a word or a numeric value (internally, logging.DEBUG maps to the integer value 10). But what if you didn’t know this, or you didn’t have the option of using an integer? Specifying ‘logging.DEBUG’ in the config file wouldn’t have worked, because it would’ve come in as a string, and not an exposed module name.

If you don’t care about locking your configuration file to a language, PyYAML will let you do what you need using language-specific tags. So, for the purposes of our program, the following two lines in YAML produce the same effect:

level: 10
level: !!python/name:logging.DEBUG

You might also choose to do this because reading ‘logging.DEBUG’, even with the added tag overhead, is probably easier to understand than trying to figure out what “10” means.

If you’re developing code that allows users to write plugins, you can also let them add their plugins by adding a simple line to a ‘plugin’ section of the YAML config file in such a way that the config dictionary itself will contain an actual new instance of the proper object:

Plugins:
- !!python/object/new:MyPlugin.Processor [logfile='foo.log']
- !!python/object/new:FooPluginModule.CementMixers.RotaryMixer
[consistency='chunky']

The above will produce a list of plugin instances with ‘args’ in the appended list fed to each classes __init__ method. Don’t forget that if you want to access the plugins by name instead of looping over a list, you can easily make this a dictionary. Also, PyYAML supports passing more intialization info to the class constructor.

Anchors and Aliases

You can create a block in your YAML config file, and then reference it in other sections of the configuration, and it can save you a lot of lines in a more complex configuration. This is done using anchors and aliases. An anchor starts with “&” and an alias (a reference to the anchor) begins with a “*”. So, let’s say you have multiple plugins loaded (continuing on from the example), and they all need their own configuration, but they’ll all connect to the same exact database server, and use the same credentials and db name, etc. Just create the db config once, make it an anchor, and reference it as needed:

DB: &MainDB
   server: localhost
   port: 6000
   user: dbuser
   db: myappdb
Plugins:
   loghandler: !!python/object/new:MyLogHandler
      args: ['mylogfile.log']
      db: *MainDB

When this is read in, the dictionary defined in &MainDB will appear as the value for the dict key [‘Plugins’][‘loghandler’][‘db’]. If you wanted to pass the *entire* config structure to your plugin, you technically wouldn’t need this, but I typically would only pass the portion of the config structure specifically dealing with the plugin, because configs can get large, and there could be lots of stuff that have nothing to do with the plugin in the rest of the config.

Moving Ahead

Although 90% of your use of PyYAML might well consist of loading a YAML file or message and working with the resulting data structure, it’s nice to know that it does provide quite a bit of flexibility if you’re willing to look for it. Here are some links for further reading about PyYAML, including a couple of items not covered in this tutorial:

Pass more initialization data to classes specified with !!python/object/new

Create your own app-specific tags, a la ‘!!bool’ and ‘!!python’.

Dump Python Objects to YAML

Share this:

  • Click to share on Twitter (Opens in new window)
  • Click to share on Reddit (Opens in new window)
  • Click to share on Tumblr (Opens in new window)
  • Click to share on Facebook (Opens in new window)

Contact Me

You should follow me on Twitter

Recent Posts

  • User Activation With Django and Djoser
  • Python Selenium Webdriver Notes
  • On Keeping A Journal and Journaling
  • What Geeks Could Learn From Working In Restaurants
  • What I’ve Been Up To
  • PyCon Talk Proposals: All You Need to Know And More
  • Sending Alerts With Graphite Graphs From Nagios
  • The Python User Group in Princeton (PUG-IP): 6 months in
  • The Happy Idiot
  • pyrabbit Makes Testing and Managing RabbitMQ Easy

Categories

  • Apple
  • Big Ideas
  • Books
  • CodeKata
  • Database
  • Django
  • Freelancing
  • Hacks
  • journaling
  • Leadership
  • Linux
  • LinuxLaboratory
  • Loghetti
  • Me stuff
  • Other Cool Blogs
  • PHP
  • Productivity
  • Python
  • PyTPMOTW
  • Ruby
  • Scripting
  • Sysadmin
  • Technology
  • Testing
  • Uncategorized
  • Web Services
  • Woodworking

Archives

  • May 2021
  • December 2020
  • January 2014
  • September 2012
  • August 2012
  • February 2012
  • November 2011
  • October 2011
  • June 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • September 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
© 2023 Musings of an Anonymous Geek | Powered by Minimalist Blog WordPress Theme