I’ve developed in a few different environments, including multi-tier ones with middle tier Java app servers and stuff, but it always seemed pretty straightforward to serve something directly from disk. And in the case of PHP, everything is served from disk. There’s no middleware to speak of, so you can throw a robots.txt file in place and it “just works”. With Django, it’s slightly different because of two things:
- Django shouldn’t be serving static content (and therefore makes it a little inconvenient though not impossible to do so).
- Django works kinda like an application server that expects to receive URLs, and expects there to be some configuration in place telling it how to deal with that URL.
If you have Django serving static content, you’re wasting resources, so I’m not covering that here. My web host is webfaction, and they give you access to the configuration of your own Apache instance in addition to your Django installation’s configuration (in fact, I’m just running an svn checkout of django-trunk), so this gives you a lot of flexibility in how you deal with static files like CSS, images, or a robots.txt file. To handle robots.txt on my “staging” version of my site, I added the following lines to my apache httpd.conf file:
LoadModule alias_module modules/mod_alias.so <Location "/robots.txt"> SetHandler None </Location> alias /robots.txt /home/myusername/webapps/mywsgiapp/htdocs/robots.txt
If you don’t add mod_alias, you’ll get an error saying that the keyword “alias” is a misspelling or is not supported by Apache. I use “<Location>” here instead of “<File>” or “<Directory>” because I’m applying the rule only to incoming requests for “/robots.txt” explicitly, and it isn’t likely that I’ll have more than one way of reaching that file, since I’m not aware of engines that look for robots.txt in some other way. <Directory> applies rules to an entire directory and its subdirectories, and <File> applies rules to a file on disk so the rules will apply even if there’s more than one URL that maps to the file.