I didn’t know this little tidbit until yesterday and want to get it posted so I can refer to it later.
I have this YAML config file that’s kinda long and has a lot of duplication in it. This isn’t what I’m working on, but let’s just say that you have a bunch of backup targets defined in your YAML config file, and your program rocks because each backup target can be defined to go to a different destination. Awesome, right?
Well, it might be, but it might also just make your YAML config file grotesque (and error-prone). Here’s an example:
Backups:
Home_Jonesy:
host: foo
dir: /Users/jonesy
protocol: ssh
keyloc: ~/.ssh/id_rsa.pub
Destination:
host: bar
dir: /mnt/array23/homes/jonesy
check_space: true
min_space: 80G
num_archives: 4
compress: bzip2
Home_Molly:
host: eggs
dir: /Users/molly
protocol: sftp
keyloc: ~/.ssh/id_rsa.pub
Destination:
host: bar
dir: /mnt/array23/homes/jonesy
check_space: true
min_space: 80G
num_archives: 4
compress: bzip2
Now with two backups, this isn’t so bad. But if your environment has 100 backup targets and only one destination, or…. heck — even if there are three destinations — should you have to write out the definition of those same three destinations for each of 100 backup targets? What if you need to change how one of the destinations is connected to, or the name of a destination changes, or array23 dies?
Ideally, you’d be able to reference the same definition in as many places as you need it and have things “just work”, and if something needs to change, you just change it in one place. Enter anchors and aliases.
An anchor is defined just like anything else in YAML with the exception that you get to label the definition block using “&labelname”, and then you can (de)reference it elsewhere in your config with “*labelname”. So here’s how our above configuration would look:
BackupDestination-23: &Backup_To_ARRAY23
host: bar
dir: /mnt/array23/homes/jonesy
check_space: true
min_space: 80G
num_archives: 4
compress: bzip2
Backups:
Home_Jonesy:
host: foo
dir: /Users/jonesy
protocol: ssh
keyloc: ~/.ssh/id_rsa.pub
Destination: *Backup_To_ARRAY23
Home_Molly:
host: eggs
dir: /Users/molly
protocol: sftp
keyloc: ~/.ssh/id_rsa.pub
Destination: *Backup_To_ARRAY23
With only two backup targets, the benefit is small, but keep trying to imagine this config file with about 100 backup targets, and only one or two destinations. This removes a lot of duplication and makes things easier to change and maintain (and read!)
The cool thing about it is that if you already have code that reads the YAML config file, you don’t have to change it at all — PyYaml expands everything for you. Here’s a quick interpreter session:
>>> import yaml
>>> from pprint import pprint
>>> stream = file('foo.yaml', 'r')
>>> cfg = yaml.load(stream)
>>> pprint(cfg)
{'BackupDestination-23': {'check_space': True,
'compress': 'bzip2',
'dir': '/mnt/array23/homes/jonesy',
'host': 'bar',
'min_space': '80G',
'num_archives': 4},
'Backups': {'Home_Jonesy': {'Destination': {'check_space': True,
'compress': 'bzip2',
'dir': '/mnt/array23/homes/jonesy',
'host': 'bar',
'min_space': '80G',
'num_archives': 4},
'dir': '/Users/jonesy',
'host': 'foo',
'keyloc': '~/.ssh/id_rsa.pub',
'protocol': 'ssh'},
'Home_Molly': {'Destination': {'check_space': True,
'compress': 'bzip2',
'dir': '/mnt/array23/homes/jonesy',
'host': 'bar',
'min_space': '80G',
'num_archives': 4},
'dir': '/Users/molly',
'host': 'eggs',
'keyloc': '~/.ssh/id_rsa.pub',
'protocol': 'sftp'}}}
…And notice how everything has been expanded.
Enjoy!