I didn’t know this little tidbit until yesterday and want to get it posted so I can refer to it later.
I have this YAML config file that’s kinda long and has a lot of duplication in it. This isn’t what I’m working on, but let’s just say that you have a bunch of backup targets defined in your YAML config file, and your program rocks because each backup target can be defined to go to a different destination. Awesome, right?
Well, it might be, but it might also just make your YAML config file grotesque (and error-prone). Here’s an example:
Backups: Home_Jonesy: host: foo dir: /Users/jonesy protocol: ssh keyloc: ~/.ssh/id_rsa.pub Destination: host: bar dir: /mnt/array23/homes/jonesy check_space: true min_space: 80G num_archives: 4 compress: bzip2 Home_Molly: host: eggs dir: /Users/molly protocol: sftp keyloc: ~/.ssh/id_rsa.pub Destination: host: bar dir: /mnt/array23/homes/jonesy check_space: true min_space: 80G num_archives: 4 compress: bzip2
Now with two backups, this isn’t so bad. But if your environment has 100 backup targets and only one destination, or…. heck — even if there are three destinations — should you have to write out the definition of those same three destinations for each of 100 backup targets? What if you need to change how one of the destinations is connected to, or the name of a destination changes, or array23 dies?
Ideally, you’d be able to reference the same definition in as many places as you need it and have things “just work”, and if something needs to change, you just change it in one place. Enter anchors and aliases.
An anchor is defined just like anything else in YAML with the exception that you get to label the definition block using “&labelname”, and then you can (de)reference it elsewhere in your config with “*labelname”. So here’s how our above configuration would look:
BackupDestination-23: &Backup_To_ARRAY23 host: bar dir: /mnt/array23/homes/jonesy check_space: true min_space: 80G num_archives: 4 compress: bzip2 Backups: Home_Jonesy: host: foo dir: /Users/jonesy protocol: ssh keyloc: ~/.ssh/id_rsa.pub Destination: *Backup_To_ARRAY23 Home_Molly: host: eggs dir: /Users/molly protocol: sftp keyloc: ~/.ssh/id_rsa.pub Destination: *Backup_To_ARRAY23
With only two backup targets, the benefit is small, but keep trying to imagine this config file with about 100 backup targets, and only one or two destinations. This removes a lot of duplication and makes things easier to change and maintain (and read!)
The cool thing about it is that if you already have code that reads the YAML config file, you don’t have to change it at all — PyYaml expands everything for you. Here’s a quick interpreter session:
>>> import yaml >>> from pprint import pprint >>> stream = file('foo.yaml', 'r') >>> cfg = yaml.load(stream) >>> pprint(cfg) {'BackupDestination-23': {'check_space': True, 'compress': 'bzip2', 'dir': '/mnt/array23/homes/jonesy', 'host': 'bar', 'min_space': '80G', 'num_archives': 4}, 'Backups': {'Home_Jonesy': {'Destination': {'check_space': True, 'compress': 'bzip2', 'dir': '/mnt/array23/homes/jonesy', 'host': 'bar', 'min_space': '80G', 'num_archives': 4}, 'dir': '/Users/jonesy', 'host': 'foo', 'keyloc': '~/.ssh/id_rsa.pub', 'protocol': 'ssh'}, 'Home_Molly': {'Destination': {'check_space': True, 'compress': 'bzip2', 'dir': '/mnt/array23/homes/jonesy', 'host': 'bar', 'min_space': '80G', 'num_archives': 4}, 'dir': '/Users/molly', 'host': 'eggs', 'keyloc': '~/.ssh/id_rsa.pub', 'protocol': 'sftp'}}}
…And notice how everything has been expanded.
Enjoy!