Archive for the ‘Big Ideas’ Category

Brain Fried Over NoSQL

Saturday, June 26th, 2010

So, I’m working on a pet project. It’s in stealth mode. Just kidding — I don’t believe in stealth mode 😉

It’s a twitter analytics dashboard that actually does useful things with the mountains of data available from the various Twitter APIs. I’m writing it in Python using Tornado. Here’s the first mockup I ever did for it, just like 2 nights ago:

It’s already a lot of fun. I’ve worked with Tornado before and like it a lot. I have most of the base infrastructure questions answered, because this is a pet project and they’re mostly easy and in some sense “don’t matter”. But that’s what has me stuck.

It Doesn’t Matter

It’s true. Past a certain point, belaboring choices of what tools to use where is pointless and is probably premature optimization. I’ve been working with startups for the past few years, and I’m painfully aware of what happens when a company takes too long to react to their popularity. I want to architect around that at the start, but I’m resisting. It’s a pet project.

But if it doesn’t matter, that means I can choose tools that are going to be fun to dig into and learn about. I’ve been so busy writing code to help avoid or buffer impact to the database that I haven’t played a whole lot with the NoSQL choices out there, and there are tons of them. And they all have a different world view and a unique approach to providing solutions to what I see as somewhat different problems.

Why NoSQL?

Why not? I’ve been working with relational database systems since 1998. I worked on large data reporting projects, a couple of huge data warehousing projects, financial transaction systems, I worked for Sybase as a consulting DBA and project manager for a while, I was into MySQL and PostgreSQL by 2000, used them in production environments starting around 2001-02… I understand them fairly well. I also understand BDB and other “flat-file” databases and object stores. SQLite has become unavoidable in the past few years as well. It’s not like I don’t understand the compromises I’m making going to a NoSQL system.

There’s a good bit of talk from the RDBMS camp (seriously, why do they need their own camp?) about why NoSQL is bad. Lots of people who know me  would put me in the RDBMS camp, and I’m telling you not to cry yourself to sleep out of guilt over a desire to get to know these systems. They’re interesting, and they solve some huge issues surrounding scalability with greater ease than an RDBMS.

Like what? Well, cost for one. If I could afford Oracle I’d sooner use that than go NoSQL in all likelihood. I can’t afford it. Not even close. Oracle might as well charge me a small planet for their product. It’s great stuff, but out of reach. And what about sharding? Sharding a relational database sucks, and to try to hide the fact that it sucks requires you to pile on all kinds of other crap like query proxies, pools, and replication engines, all in an effort to make this beast do something it wasn’t meant to do: scale beyond a single box. All this stuff also attempts to mask the reality that you’ve also thrown your hands in the air with respect to at least 2 letters that make up the ACID acronym. What’s an RDBMS buying you at that point? Complexity.

And there’s another cost, by the way: no startup I know has the kind of enormous hardware that an enterprise has. They have access to commodity hardware. Pizza boxes. Don’t even get me started on storage. I’ve yet to see SSD or flash storage at a startup. I currently work at MyYearbook.com, and there are some pretty hefty database servers there, but it can hardly be called a startup anymore. Hell, they’re even profitable! 😉

Where Do I Start?

One nice thing about relationland is I know the landscape pretty well. Going to NoSQL is like dropping me in a country I’ve never heard of where I don’t really speak the language. I have some familiarity with key-value stores from dealing with BDB and Memcache, and I’ve played with MongoDB a bit (using pymongo), but that’s just the tip of the iceberg.

I heard my boss mention Tokyo Tyrant a few times, so I looked into it. It seems to be one of the more obscure solutions out there from the standpoint of adoption, community, documentation, etc., but it does appear to be very capable on a technical level. However, my application is going to be number-heavy, and I’m not going to need to own all of the data required to provide the service. I can probably get away with just incrementing counters in Memcache for some of this work. For persistence I need something that will let me do aggregation *FAST* without having to create aggregation tables, ideally. Using a key/value store for counters really just seems like a no-brainer.

That said, I think what I’ve decided to do, since it doesn’t matter, is punt on this decision in favor of getting a working application up quickly.

MySQL

Yup. I’m going to pick one or two features of the application to implement as a ‘first cut’, and back them with a MySQL database. I know it well, Tornado has a built-in interface for it, and it’s not going to be a permanent part of the infrastructure (otherwise I’d choose PostgreSQL in all likelihood).

To be honest, I don’t think the challenge in bringing this application to life are really related to the data model or the engine/interface used to access it (though if I’m lucky that’ll be a major part of keeping it alive). No, the real problem I’m faced with is completely unrelated to these considerations…

Twitter’s API Service

Not the API itself, per se, but the service providing access to it, and the way it’s administered, is going to be a huge challenge. It’s not just the Twitter website that’s inconsistent, the API service goes right along. Not only that, but the type of data I really need to make this application useful isn’t immediately available from the API as far as I can tell.

Twitter maintains rate limits on the API. You can only make so many calls over so short a period of time. That alone makes providing an application like this to a lot of people a bit of a challenge. Compounding the issue is that, when there are failwhales washing up on the shores, those limits can be dynamically decreased. Ugh.

I guess it’s not a project for the faint of heart, but it’ll drive home some golden rules that are easy to neglect in other projects, like planning for failure (of both my application, and Twitter). Also, it’ll be a lot of fun.

Programmers that… can’t program.

Monday, March 15th, 2010

So, I happened across this post about hiring programmers, which references two other posts about hiring programmers. There seems to be a demand for blog posts about hiring programmers, but that’s not why I’m writing this. I’m writing because there was this sort of nagging irony that I couldn’t help but stumble upon.

In a blog post, Joel Spolsky talks about the mathematical inaccuracies associated with claims of “only hiring the top 1%”. It seemed pretty obvious to me that whether or not you’re hiring the top 1% of all programmers is pretty much unknowable, and when managers say they hire “the top 1%”, I assume they’re talking about the top 1% of their applicants. Note too that I always thought it was idiotic to point this out, because, well, isn’t that what you’re SUPPOSED to do? You’re not very well going to aim for the middle & hope for the best are you?

Apparently I’ve been giving too much credit to management. There I go giving people with ties on the benefit of the doubt again.

Then, in another blog post, Jeff Atwood talks about how it’s very difficult to even get interviews with programmers who can actually program. The problem is real.

The original blog post that pointed me at the two others is one by Roberto Alsina where he talks about his own methods for weeding out the non-programmers. He’s clearly seen the issue as well.

But if you open all three of these posts in separate tabs and read them, you’re likely to come away with the same basic problem I did:

  • Who the hell are these managers who can’t figure out a dead simple statistics problem?
  • How can a person fairly inept at simple math be qualified to make a hiring decision for anything but a summer intern?

That sorta blew my mind a little. But it blew my mind a lot when Atwood started describing the problems that interviewees *couldn’t* perform in an interview! One task described by Imran was called a ‘FizzBuzz’ question. Here’s one such question:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Here’s the part that blew my mind: He says, and I quote:

Most good programmers should be able to write out on paper a program which does this in a under a couple of minutes.

Want to know something scary ? – the majority of comp sci graduates can’t. I’ve also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.

That’s amazing to me. I decided to quickly pop open a Python prompt and see if I could do it:

>>> for i in range(1,101):
...     if (i % 3 == 0) and (i % 5 == 0):
...             print i,'FizzBuzz'
...     elif i % 3 == 0:
...             print i, 'Fizz'
...     elif i % 5 == 0:
...             print i, 'Buzz'
...     else:
...             print i
...

Note that I’ve taken the liberty of printing out the numbers in addition to the required words. I’m playing the role of interviewer and interviewee here, and wanted to be able to easily verify that things were correct, since there was no time for unit testing :)

Turns out it worked on the first try! That was pasted directly from my terminal screen. I didn’t time myself, but it took far less than 5 minutes. This leads to my other question, of course, which is “if you’re going to complain about CS degree holders not writing good code, maybe it’s time to open the doors to non-CS degree holders?”

If You Don’t Date Your Work, It Sucks.

Monday, January 18th, 2010

I probably get more upset than is reasonable when I come across articles with no date on them. I scroll furiously for a few minutes, try to see if the date was put in some stupid place like the fine print written in almost-white-on-white at the bottom of the post surrounded by ads. Then I skim the article looking for references to software versions that might clue me in on how old this material is. Then I check the sidebars to see if there’s some kind of “About this Post” block. Finally, I make a mental note of the domain in a little mental list I use to further filter my Google searches in the future. Then I close the browser window in disgust. If it weren’t completely gross and socially unacceptable to do so, I would spit on the floor every time this happened.

Why would you NOT date your articles? In almost every single theme for every single content management solution written in any language and backed by any database, “Date” is a default element. Why would you remove it? It is almost guaranteed to be more work to remove it. Why would you go through actual work to make your own writing less useful to others?

What happens when you don’t date your articles?

  1. People have no idea whether your article has anything to do with what they’re working on.  If you wrote an article about the Linux kernel in 1996, it’s of no use to me *now*, even if it was pretty hardcore at the time.
  2. Readers are forced to skim your article looking for references to software versions to see if your article is actually meaningful to them or not. Why make it hard for people to know whether your article is useful? The only reason I can think of is that you already know your articles are old, so not dating them insures that people at least skim enough to see some of the ads on your site. You are irreversibly lame if you do this.
  3. It causes near seizures in people like me who really hate when you don’t date your work, as well as all of your past teachers, who no doubt demanded that you sign and date your work.
  4. Every time you don’t date an article online, a seal pup is clubbed to death in the arctic, and a polar bear gets stranded on a piece of ice.

At some point, I will make an actual list of web sites that regularly do not date their work. A sort of hall of shame for sites that fail to link their writing to some kind of time-based context. If you have sites you’d like to add, let me know in the comments.

Create a Tagging Index Page with django-tagging

Tuesday, August 11th, 2009

For those not following along, I’ve been recreating LinuxLaboratory.org using Django. It’s my first Django project that you could call “real work”. I’ve been using the Django documentation, various blogs, and the 2nd edition of “Practical Django Projects”, which has given me a lot of ideas, and a lot of actual code. Some of it worked, some of it didn’t, some of it didn’t do things the way I wanted, and some of it was left as an exercise to the user. This was the case with some of the django-tagging-related stuff, which has been broken on the site for a while.

I finally got tired of tagging not working properly on LinuxLaboratory.org, so I started diving into the code and found that one of the things I wanted to do was actually pretty darn easy. In the process, I thought of something else I’ll probably implement later as well. Not *all* of my problems are solved, but I’m on my way!

So, Linuxlaboratory is made up of three different sections: a blog, which is its own app, a “Code” area which is another separate app, and a content management system (flatpages right now) that will handle storing republished articles when I get around to importing them all.

I enabled tagging on everything. I’m not solidly in one camp or the other on the whole “Tagging everything is bad” debate. Rather than theorize, I decided to give it a go and see how it does. My guess is that once I add a search box it will rarely actually be used, but what do I know?

The problem this presented me with was trying to figure out a way to present the user with one big monstrosity of an “index lookup” page, which would list all of the tags, and for each tag, list links to anything of any content type that was tagged with it. I understand that this could become unwieldy at some point, but if I need to I suppose I can pretty easily paginate, or present alphabet links, or perhaps both!

Though I understood the potential for future disaster, it still bothered me that I couldn’t find a quick answer to the question, so here it is for those wanting to do something similar with django-tagging. For reference, my tag index page is here.

Template Tags in django-tagging

I had actually started creating a custom template tag, and was looking at the Django docs, which stated “a lot of apps supply custom template tags”. Duh! I cd’d into the tagging directory, and there was a directory called “templatetags”. The tagging_tags.py file inside was pretty well documented, and the tag I was about to write myself is called ‘tagged_objects’. Here’s the docstring for that tag:

“””
Retrieves a list of instances of a given model which are tagged with
a given “Tag“ and stores them in a context variable.

Usage::

{% tagged_objects [tag] in [model] as [varname] %}

The model is specified in “[appname].[modelname]“ format.

The tag must be an instance of a “Tag“, not the name of a tag.

Example::

{% tagged_objects comedy_tag in tv.Show as comedies %}

“””

Perfect. I already had a tag_list.html template (which, if memory serves, is one of the things left as an exercise to the user in Practical Django Projects), and it listed the tags in use on the site, but instead of linking off to a ‘tag_detail’ page for each tag, I envisioned something more interesting. I’m not there yet, but this index page is step one.

Putting Together a Template

What I needed to do was simply {% load tagging_tags %}, and then call the {% tagged_objects %} tag with the proper arguments, which consist of a tag *object* (not a tag name), the model you want to grab instances of, and a variable name you want to store the instance in. Here’s the content block from my tag_list.html:

{% block content %}
{% for tag in object_list %}
<div id="entry">
   <p>{{ tag.name }}</p>
   <ul>   {% load tagging_tags %}
      {% tagged_objects tag in  monk.Entry as taggedentries %}
      {% for entry in taggedentries %}
         <li><a href="{{ entry.get_absolute_url }}">{{ entry.title }}</a></li>
      {% endfor %}
      {% tagged_objects tag in ray.Snippet as taggedsnippets %}
      {% for snippet in taggedsnippets %}
         <li><a href="{{ snippet.get_absolute_url }}">{{snippet.title}}</a></li>
      {% endfor %}
   </ul>
</div>
{% endfor %}
{% endblock %}

So, the view I’m using supplies Tag objects in a variable called object_list. For each Tag object, I spit out the name of the tag. Underneath that on the page, for each tag, there’s an unordered list. The list items are the Entries from my “monk” application, and my Snippets from my “ray” application. I hope reading this template along with the bit above from the docstring for the template tag helps someone out. And check out the other tags in tagging_tags.py as well!

Rome Wasn’t Built in a Day

Of course, there’s still an issue with my particular implementation. Tagging was originally implemented specifically for the entries in the “/weblog/” part of my site. However, now that they’ve been applied to things in the “/snippets/” part of my site, this page doesn’t *really* belong in either one. However, if you go to the page, you’ll see that the “Blog” tab in the navigation bar is still highlighted. I’ll figure out what to do with that at some point. Until then, enjoy, and if you have any input or wisdom to share, please leave comments! Also, you should follow me on twitter!


				

The Neverending Search for “Free” Wi-Fi

Tuesday, August 11th, 2009

So, I’m a freelancer. I work a lot on remote machines as a system administrator, a troubleshooter of LAMP stacks and web applications, etc. I also do a little bit of web development (but not design. I’m a horrible designer). I work from home a lot. I used to work outside of the home a lot, but what I found is that “free” wireless access has so many downsides that it’s just easier to stay home. I live in the Princeton, NJ area, and have attempted to get free wireless access at Barnes & Noble, Borders, Panera, Starbucks, and a few local businesses. Here’s what I found:

Panera Bread

Yes, the wireless access is free, but it kicks you off for TWO HOURS during the lunch rush. What makes this truly horrible is that there isn’t (as far as I know) an option to *pay* for your wireless access and bypass this limitation. The odd thing is that it seems to backfire on them: if I were able to browse my RSS feeds while I ate a nice Panera lunch, I’d probably stick around. As it stands, if I go there at all, I leave at lunch time and either go home or somewhere else. I’ll eat breakfast there because they don’t turn off wireless at that time.

Turning off wireless is just not acceptable for someone who needs it to be on pretty much all the time. Clearly, Panera isn’t catering to people who are going to hang around there all day, but maybe they should: if they didn’t turn off wifi, I’d spend more than double what I spend there in a given day. I get a coffee and maybe a pastry in the morning, but if wifi stayed on, or I had the option of paying for it, I’d add to that a Frontega Chicken sandwich, maybe a bread bowl of soup in the winter, and at least two lemonades.

But now… I go somewhere else.

Barnes & Noble

Barnes & Noble recently announced that they now have free wifi. The problems with going to BN for this are many. First, going free increases demand for free wifi, which of course increases the demand for power outlets. There are surprisingly few at the location near me. The cafe area in particular hasn’t got even one single power outlet.

But power availability isn’t the worst of it. The worst part is that AT&T runs the wifi access, and as soon as I saw that, I knew something was going to be completely wrong, and I was right: AT&T drops your DHCP lease every 2 hours. EVERY TWO HOURS. There’s no warning dialog either that pops up to say “hey, we’re gonna drop you in 10 minutes”. Things just disappear. Then you have to visit the registration page again and click a checkbox and a button to be reconnected.

Probably ok for a casual email checker, but not for anyone looking to hang out for a while and do “real work”.

Starbucks

Ugh. Forget it. AT&T runs this one as well, and when I asked at my local store how to get on, they asked about my Starbucks card. I have one of those black cards that they call a “Gold Card”. Whatever. The numbers are worn off of it, and I only use it as a discount card — it’s not registered. So it needs to be registered, and then I have to WAIT 48 HOURS, and then I’m entitled to 2 hours free wifi per day. But to register, I have to go through some procedure, and they had to find a way to retrieve the last 4 numbers on my card, because they put the numbers in the area that gets swiped (bright), and they’d rubbed off.

I considered getting one of the new mini cards, which has numbers embedded underneath the plastic, but it was recommended that I stick with only one card or the other. There was seemingly no valid reason for this. I didn’t understand the recommendation, but whatever.

The alternative is to pay for it on the spot, which I might’ve done, but the wifi was down when I tried to connect.

Anyway, this all seems rather messy, doesn’t it? Between my iPhone, Barnes & Noble, and Starbucks, AT&T is making nothing so clear as the fact that they don’t want my money.

Borders

I’m actually writing this post from a Border’s bookstore. The wi-fi here IS NOT free. Know what that means? Well, it means I have to pay for it of course, but it also means there’s almost nobody here. In a cafe area that probably seats 60 or more, at 10:15AM, there are 4 people here, and I’m the only one with a laptop.

Wi-fi here is $8 for a day pass, which isn’t horrifically bad. What *is* pretty bad is that almost all of the chairs here are made of 100% hard wood with no padding of any kind. What is HORRIBLY HORRIBLY bad here is the food. If it’s advertised as edible, DON’T EAT IT. I mean bad. There aren’t English words to describe the badness. It’s No Bueno™. The selection of lunch-worthy food is super small, too. And bad. Did I say the food is bad? It is.

So I pay $8, I get access for 24 hours, and I can leave and walk across the parking lot for lunch, come back, and sign right back in. Not bad. If I had my lap desk with me, I could even sit in one of the well-padded armchairs. I feel a little guilty spending almost no money here, but I’ve *tried* to spend money on food and drinks, and I’ve really just been horribly disappointed. The only thing I’ll ingest here is the coffee. The saving grace for my conscience is that I’m paying for the wi-fi, so I don’t feel the need to spend money on stuff I might not otherwise be interested in.

The Locals Win It

Two local businesses stand out in terms of their free wifi offering. A local person that it turns out I actually know opened up a Camille’s Cafe, and there’s a local coffee shop in Hopewell that I am slowly starting to adore.

Camille’s is closer to my house, but it has, for the entire place, something like two power supplies, and they’re not placed very conveniently. However, the wifi is Really, Truly Free, and that’s good. The food is also good, and you can get healthy stuff there, so I don’t have to buy something deep-fried or made of 85% refined sugar to justify my being there sucking up their wifi.

The local coffee shop is perhaps my favorite place. The wifi is Really, Truly Free, and I would call the power situation “adequate”. The coffee and the food are both really good, and you can also get healthy stuff there. The only problem that exists at this place is parking, but usually I can get around that without too much trouble.

My Django Project Update: RSS Feed, “Home” Link, and more.

Monday, August 10th, 2009

In continuing the rebuild of LinuxLaboratory.org using Django, I’m happy to say that things have moved fairly smoothly. I’m using a good mix at this point of stuff from the 2nd edition of “Practical Django Projects”, the Django documentation, blog posts, and docs from other apps I’m making use of.

RSS

I said in one of my previous posts that I’d wait until I burned my feed before giving out the link, and I just did that, so if you want to subscribe to the LinuxLaboratory Blog feed, here’s the link to do that. Right now there’s just one feed for all of the blog entries, but since I post almost all of my really geeky stuff here, the LLO Blog will be mostly site updates like new articles, code, or features being added. The LLO Blog isn’t something that’s intended to get tons of traffic or have tons of posts all the time. The meat of the site will be the content management system which houses articles, and the “Snippets” area which will house scripts and hacks and stuff.

The “Home” Link in Django

I’m not sure why, but it took me a little time to figure out how to link to the base site in a Django template. I had some URL routing set up such that, well… here’s what I have:

In the main project’s urls.py:

(r'^$', include('monk.urls.entries'))

I named my blog app “monk”, after Thelonius Monk. There’s actually a reason I picked his name for a blog app, but it’s not important right now (though, for a chuckle, I picked his last name because his first name breaks a long-standing “8 character” tradition in UNIX).

Anyway, in the corresponding URLConf in monk, I have:

(r'^$', 'archive_index', entry_latest_dict, 'monk_entry_archive_index')

And then in one of my base templates I had this (which is perfectly valid code):

<a href="{% url 'monk_entry_archive_index'}">LinuxLaboratory.org</a>

The ‘{% url %}’ tag can take a URLConf name as an argument, and it’ll do a reverse lookup to get the URL, which is nice, except that this would always land people at “http://linuxlaboratory.org/weblog”, and I wanted them to just go to the base URL for the site. The canonical home page. The root URL. Whatever you want to call it.

There are multiple ways to link back to the base domain from within a template, but I’m not sure if there’s a canonical, “Django-sanctioned” method. You can just make an href pointing to “/”, you can hard-code the whole URL, and I found that doing “href={{ settings.SITE_ID }}” also worked just fine. I tried that last one after discovering that the base URL for the site isn’t in settings.py, and reading that SITE_ID was used by some applications to help them figure out their own URL routing. SITE_ID is a numeric value that represents, according to the Django docs, “the current site in the django_site database table”.

That’s a little confusing, but if you just have a look at the table, it starts to become clear how this could work:

mysql> select * from django_site; 
+----+---------------------+---------------------+
| id | domain              | name                |
+----+---------------------+---------------------+
|  1 | linuxlaboratory.org | linuxlaboratory.org | 
+----+---------------------+---------------------+

It seems logical that using {{ settings.SITE_ID }} in a template could cause the right things to happen, but I haven’t gone diving into the source code I’d need to to prove that it does.

What’s the canonical way of doing this?

Up Next…

So, I have what I think will be a decent setup for code sharing (complete with highlighted syntax), a solid foundation for a blog app, and I’m working on the content management system. I’m using TinyMCE in the admin interface to edit blog posts as well as the CMS content. I’ve got very very basic CSS in place. The basics are here. Now what?

Well, first I need to get my ducks in a row. This includes:

  • Stabilizing a proper development and deployment workflow. There’s a rather nice setup over here, and the 2nd edition of Practical Django Projects has also been enlightening in this regard.
  • Cleaning up my templates. I created the Blog app first, and so of course now I have like 3 separate apps, and I don’t really want them all to have a different look and feel, so I need to abstract some bits and perhaps create a “/templates/common/” directory that will be referenced by all of the apps in the project…? How do you do it?
  • I’d like to get some high-level navigation horizontally across the top instead of having those links in the sidebar. I don’t want fancy popout menus — just a very simple bar where the nav links basically just represent the functions of the different apps: Blog, Articles (CMS), Code, and maybe an About link or something.
  • And more!!

After that stuff is out of the way I’ll start thinking about some new features:

  • I’d like to be able to include images (maybe multiple images) in various types of content. In fact, perhaps all kinds of content. Screenshots for the code snippets, stock images for CMS and blog content, etc. I’m a little intimidated by this because I know my web host (webfaction) limits the amount of memory I can use at any given time. I guess I can just manually scp images to my static media location and link them into the content, but it doesn’t seem ideal. Ideally I can upload them in the same interface where I edit the content, and maybe have the img src tag associated with the instance of the model I’m editing in the database. Does something do this already?
  • I’d really like to have a wordpress-style “Stats” page. Actually, I’d like to have a much better stats page than the wordpress one, but that’d be a start. The stats page I’d actually *like* to have is best described by Marty Allchin here (2 years ago. Anyone know of an app that is aiming for anything close to that?)
  • RSS feeds for the code snippets section (this should be simple)
  • On-the-fly PDF generation for downloading the PDF version of… whatever. A code snippet, an article… Haven’t even investigated this yet.
  • I really, really, really, really want to turn my geek conference calendar into a google maps mashup using GeoDjango. This is another bit I’m slightly intimidated by, because I didn’t realize GeoDjango was built-in these days, and I started out down this road using MySQL as my database for whatever reason (instead of PostgreSQL, which I actually like better and, as it turns out, has way more mature GIS functionality).
  • At some point, I’d like to ensure that the old URLs to content that was on the old site will actually still work, and land people on the same content in the new site. The URL layout isn’t actually horrifically different, so hopefully I can get that in place without *too* much fuss. I know Django has a redirection app built-in, but I’m not sure if this is the right way to do it, or if I should just use Apache rewrite rules. Anyone compared the two?

So those are the big goals. Some are simple, others less so, but I hope to complete all of this within the next… well, before the new baby is born, which is going to be some time in the first half of September. Wish me luck (on both), and please share your tips on how I might accomplish any of the above goals (I’ve heard all the tips I can handle about kids, thanks).

Django, Pygments, Templates, Code Sharing, and Design

Sunday, August 9th, 2009

Welcome to the latest update!

I spent a total of about an hour on the “Social Code Sharing” application in the 2nd edition of “Practical Django Projects”, and I’m not completely finished with it, but I’ve got syntax highlighting and the basics covered. I can add, edit, and delete snippets, list snippets, and show a particular snippet, complete with syntax highlighting. In an hour. That’s pretty ok, considering I spent a lot of that time debugging what I would call, at best, an unexplained inconsistency in the book that caused me to run into a series of errors before finally figuring out the right way to do it (it’ll all be in the book review – stay tuned for that).

I don’t have links or user contribution set up yet. I only have a quick example up right now, which you can see here, which brings me to the design question: I want my navigation sidebar on the right, but I really don’t want the code to overlap it in any way. I’m kinda wondering if there’s an elegant way to fix that. I considered moving the sidebar to the left, but didn’t want to do that. I considered just making the main content area bigger, but then I’m just guessing, and praying that it fixes the problem for all future code snippets. What’s there now is just a CSS-based solution that creates a scrollable area for the code if it’s too long. Maybe this will suffice (though I have some CSSing to do to refine what’s currently there — the entire content area scrolls right now!)

Since there’s no example template for dealing with code highlighting in the book, and the downloadable code only goes through the first app in the book (the CMS app), here’s the template I’m using for the snippet_detail.hml file:

{% block title %}{{block.super}} | Random hacks{% endblock %}
{% block extrahead %}
<link rel="stylesheet" type="text/css" href="/static_media/llo_main/css/pygments.css" />
{% endblock %}
{% block content %}

 <ul>
 <li>{{ object.title }}</li>
 <li>Published: {{ object.pub_date }} Updated: {{ object.updated_date }}</li>
 <li>Language: {{ object.language }}</li>
 <li>Author: {{ object.author }}</li>
{% load markup %}
 <li>Description: {{ object.description_html|markdown }}</li>
 </ul>

<p>
{{ object.highlighted_code|markdown }}
{% endblock %}

Your CSS path will be different, in all likelihood, and there are some touches missing, like use of a variable to make the title bar dynamic and stuff, but right now I’m shooting for function — the form will come later.

For the snippet_list.html file, I found that what’s in the book just doesn’t work at all. It uses pagination using Django’s built-in pagination capabilities, but the small subset of a template that’s provided references a {{ page }} variable, which does nothing. The Paginator class appears to make individual pages available to templates in a variable you’d reference as {{ page_obj }}, not {{ page }}. Referencing {{ page_obj }} by itself (without referencing any attribute of it) results in output that says something like “<Page 1 of 1>”.

It’s not necessarily pretty to have those “< >” around the edges of the output, and it’s not necessary to use this method of getting the page number and the page count. The page number is available as an attribute of the page class, and the total number of pages is available from the associated paginator object’s ‘num_pages’ method, which you can grab via {{ page_obj.paginator.num_pages }}. You can see my snippets_list.html output here, and here’s my template (at time of writing – note that my site is a work in progress and can change without warning):

{% block content %}

<p>Page {{ page_obj.number }} of {{ page_obj.paginator.num_pages }}</p>
<p>
{% if page_obj.has_previous %}
<a href="?page={{page_obj.previous_page_number}}">Previous page</a>
{% endif %}
{% if page_obj.has_next_page %}
<a href="?page={{ page_obj.next_page_number }}">Next page</a>
{% endif %}</p>

{% for snippet in object_list %}
 <ul>
 <li><a href="{{snippet.get_absolute_url}}">Title: {{ snippet.title }}</a></li>
 <li>Published: {{ snippet.pub_date }} Updated: {{ snippet.updated_date }}</li>
 <li>Language: {{ snippet.language }}</li>
 <li>Author: {{ snippet.author }}</li>
 <li>Description: {{ snippet.description_html }}</li>
 </ul>
{% endfor %}

{% endblock %}

In the interest of full disclosure, I should also say that, at time of writing, I haven’t added enough snippets to fully test the previous and next page links. I’ll get there, I’m sure — especially when I get user contributions in place, which will happen when I feel a little more confident that this is going to happen in a secure manner. Having been to PHP land, it’s a little unnatural for me to trust the code that comes with a framework to do things properly, which my sysadmin brain always parses as “securely, and in a functionally correct manner”.

Please share tips in the comments! That’s the update for now — stay tuned for more as the recreation of LinuxLaboratory.org continues to unfold.

Django RSS Feed Finally Working

Thursday, August 6th, 2009

Ok, so now LinuxLaboratory has the following features working:

  • comments (along with comment moderation and akismet support),
  • a WYSIWYG editor for my posts
  • markdown support for comments
  • email notification of errors and new comments
  • a full-blown administration interface that lets me deal with any apps, users, or content on my site,
  • some cheesy CSS (hey, it’s better than black-on-white, at least IMHO)
  • an RSS feed
  • sidebar content that includes information about the current content, and links to other recent content on the site.
  • Maybe some other stuff I’ve forgotten. It’s late.

There’s only one RSS feed available at the moment, and I’m not linking to it because the URL will likely change and then I’ll have people all pissed at me, or worse, not subscribed to the feed. I’ll set up FeedBurner on it and post that URL when I get it set up. This way, if my feed URL changes, I can change it in the FeedBurner setup, and everyone else just subscribes using the FeedBurner URL. Handy.

I’m veering away from the book for some of this stuff. For those who weren’t following along, this is all being done with Django, and I’ve been using the 2nd Edition of Practical Django Projects to help me along both in getting this thing built, and in learning more about Django. I use Python for everything else — why not the web?

Anyway, I’m jumping around the book at this point, taking bits from various parts of the first 7 chapters, checking out the official django docs, looking at lots and lots of articles on blogs all over the place, and putting things together the way I want them. For now, I just want a stupid-dumb RSS feed. I’ll get to doing category-specific feeds and Atom feeds later (I also want comment feeds, and one of those cool things that links tweets about the post into the comment thread and stuff).

For now, I also just want a blog, and I’ll get to the CMS later. In fact, through the building of the blog application I’ve become pretty confident, so I’ll probably grab bits from the CMS and the Code Sharing application in the book and put something together that makes sense for my needs.

“Practial Django Projects”, Custom Template Tags, and Updates on LinuxLaboratory

Wednesday, August 5th, 2009

Hi all,

If you’ve not been following along, I’ve picked up the 2nd edition of Practical Django Projects, and am using it to help me reinvent LinuxLaboratory.org (LLO). Though LLO is really a documentation site where I republish articles I’ve written for Linux.com, O’Reilly, and others over the past 10 years, I started out by putting a blog application in place as a way for me to communicate with people interested in how LLO is being built. Once I get comments and RSS in place, it will serve that purpose much more effectively. Until then, I’m putting more updates here than there.

So let’s have a look at what I’ve gotten done so far, and the stumbling blocks I’ve run into.

Look and Feel

Just looking at the blog application in its current form, you can see that I’ve done a thing or two to customize the look and feel so it’s not just plain black text on a white background. I added a simple CSS stylesheet that puts the sidebar where it should be, and updates the font in use. If you’re not using IE, you can probably also see a thin dotted line around the entries and the sidebar. This is probably temporary, and helps me quickly debug CSS issues I’m likely to come across early on (I’m not a hardcore designer, if you couldn’t tell). The accomplishment there was understanding my web host’s setup enough to ensure that static files like CSS stylesheets are served by the “main” Apache instance, and not my private instance which is hosting my Django application. This helps keep memory consumption down, and keeps things moving quickly.

Here’s a shot to give you an idea of the current overall look and feel. DISCLAIMER: It’s not impressive.

llo_mainviewIt’s brown. I like brown, and there aren’t tons of brown sites everywhere. I’ll likely change things as I move forward, because I also have a logo for the site and some other ideas and such. I also have aspirations beyond just a blog and CMS site. I’m going to add applications that actually do stuff, in part just to see if I can, in part because I’ll use them, and in part because others might find them useful. For example, I’ll be adding a subnet calculator, and a bandwidth delay product calculator.

The Sidebar: Entries, Categories, Links, and Maybe Tags

First, you should know that what appears on any given page of a Django site, along with what it looks like, the content it holds, and any other attribute can be modified/dynamic/etc for each page of the site without reproducing a whole bunch of code. Django gives you so many shortcuts to use to make this “just happen” that it’s really educational to go through the process of building a Django application, even if you hate Python and refuse to use it for real work. There are lessons to be learned here that you might find useful back in your development platform/framework of choice.

My sidebar right now contains the main “Navigation” links, a “What is this?” entry for each page you land on telling you what you’re looking at, a list of the most recent entries on the blog, and the most recent links posted to the Links section.

The “Recent Links” and “Recent Entries” sections show an interesting bit of Django magic. There are separate tables in the database for Links and Entries, and the fields aren’t all the same in the tables, and yet, I didn’t write one bit of SQL to get the data out, and didn’t write very much code at all to present it. Django supplies a few collections of default “views”, and one collection of them is for presenting data that is “date-based”. Pass in a few parameters to tell it the model (Link or Entry), and the number of recent entries to grab, and it goes off and does it for you.

Believe me, this *does* seem a little confusing at first. Figuring out how an http request is handled, how all of the data is gathered and passed to a template, and then finally rendered, takes a bit of time. Debug enough issues in the development of your first app or two, and you’ll get it cemented in your brain.

The one thing that still doesn’t work right is the ‘Tags” link on the sidebar. In the book, this is implemented using the django-tagging module, which is a third party application. Setting up my first 3rd party app for use with Django was no trouble at all, but I think the book should’ve gone through a bit more hand-holding in dealing with django-tagging. I hit a few glitches in using it at first, but because I had a lot of Django’s basic workings figured out by then, I was able to fix things on my own. Others might not be so lucky.

The django-tagging app’s model doesn’t really look like one I’ve seen so far in my Django travels, and is completely different from the models I created for my blog app. I figured out what was going on, and I have some idea what the path to success will look like for creating the “tag reference” page I’m hoping to build, but I decided that I’d put it aside and move on to dealing with things that are more immediately useful. I rarely if ever use tags on, say, this blog. RSS feeds and comments are in the next chapter, and I really can’t live without those.

Before I moved on to RSS and comments, though, I wanted to understand “custom template tags” in Django, so I went through the end of Chapter 6 and created one, and then created a more generic one to replace the first one. The more generic one (get_latest_content) caused Django to issue 500 errors. The idea of ‘get_latest_content’ is that it’s more generic to create one template tag that can take arguments telling it what “content” is than to create separate template tags for each type of content on the site. Unfortunately, I *believe*, but am not *sure* that there’s actually a bug in Django that makes this not work.

To implement the custom tag, you need to use a method called “get_model”, which I believe is supposed to return an object of type “model”, which will then have an attribute called “_default_manager”. What *actually* happens is it seems to be returning a “unicode” object, which has no such attribute, and Django tells you so.

So, I created separate “get_latest_entries” and “get_latest_links” template tags. Here are my custom template tag definitions:

def do_latest_entries(parser, token):
     return LatestEntriesNode()

class LatestEntriesNode(template.Node):
 def render(self, context):
    context['latest_entries'] = Entry.live.all()[:5]
    return ''

register = template.Library()
register.tag('get_latest_entries', do_latest_entries)

def do_latest_links(parser, token):
  return LatestLinksNode()

class LatestLinksNode(template.Node):
   def render(self, context):
     context['latest_links'] = Link.objects.all()[:5]
     return ''

register.tag('get_latest_links', do_latest_links)

If ‘get_latest_content’ worked, I’d only need to register one tag to do the jobs of both of these tags, but it’s not like this is a horribly difficult bit of code to manage, so it’s a workaround until either I find my typo or Django fixes their bug. This is a rare instance in which I do *not* feel like a workaround for a problem is a hack that’s going to chomp down on my “jewels” later on.

If you run into this issue and decide to go this route, don’t do what I did at first and call “register = template.Library()” for each tag. You create *one* new library, and then register *all* of your tags to it. The book (through Chapter 6 anyway) only has one tag at a time in there, so it’s not covered.

Here’s some code from the part of the template that uses the custom tags:

<h2>Recent Entries</h2>
<ul>
  {% get_latest_entries %}
  {% for entry in latest_entries %}
    <li><a href="{{entry.get_absolute_url}}">{{entry.title}}</a>
posted {{entry.pub_date|timesince}} ago.</li>
  {% endfor %}
</ul> 
<h2>Recent Links</h2>
<ul>
   {% get_latest_links %}
   {% for link in latest_links %}
     <li><a href="{{link.get_absolute_url }}">{{ link.title }}</a>, 
posted {{ link.pub_date|timesince}} ago.</li>
   {% endfor %} 
</ul>

It’s been a lot of text, so here’s another screen shot, this time of my ‘Categories’ page, which I altered a bit from llo_categoriesthe book: I wanted mine to be an index-style listing that shows the category, and all posts in that category, instead of just showing categories and making the user click to see the entries in that category. Chances are the user isn’t really just curious to see what categories exist. Chances are also that some day I’ll be sorry I did it this way because I’ll have so many categories and posts that browsing this page will be cumbersome, but it’s *SO* easy to change it around that I’ll deal with it when I get there.

The Big Win: The Admin Interface

Django fanboys are quick to point out that Django’s admin interface is not just a “battery included”, but rather a “diesel-fueled power generator included”, and I’m inclined to agree. Writing admin interfaces is no fun, in part because end users never see it, so you can’t really show it off. Admin interfaces I’ve seen and written for in-house applications are usually design nightmares, and require some tribal knowledge to use effectively. I applaud the Django folks for doing the best job I’ve seen thus far at automatically creating an admin interface to manage pretty much *every* aspect of the site.

Once I created the data models for links, tags, categories, and entries, I was able to immediately use the admin interface to create new blog entries, add new links, new categories, etc. Of course, I can edit and delete items using the admin interface as well. The admin interface’s goal is to be functional — it doesn’t assume it’s going to be used by folks used to using Microsoft Word, or Emacs for that matter. Input is all just textarea elements, but if you want to you can plug in TinyMCE and give admins a wysiwyg admin interface. The book shows you how to use TinyMCE in its creation of a CMS application. I didn’t do the CMS app, but have downloaded TinyMCE and plan to use it as soon as comments and RSS are working on the blog.

Here’s a shot of the admin interface, which I did just about *nothing* to creatello_admin_entries.

This is, specifically, the part of the admin interface dealing with entries (blog posts). From here I can add a new entry (using that gray button in the top right corner), I can click an existing entry to edit it, or I can check the box next to an entry and choose “delete” from the “Action” drop down menu to delete it. It’s simple, but functional enough that I’d imagine most people using this for their own needs won’t find it necessary to create another one — at least not from scratch. The admin interface *is* customizable.

Apps and Projects

Another reason I started off building the blog app from the book is because it’s a standalone application, and not a project like the CMS, which is the first project in the book. I plan to take this experience with building a standalone application and then go back to build a standalone CMS as well instead of building the CMS as a project, and the blog as an app that can be used by the CMS project. I’m not sure what the logic was in setting the book up that way, but it seems to me (and I could be so, so laughably wrong here) that the project should contain as little code as possible. Preferably none if possible. It should contain URLConfs (and as few of those as possible), and settings.py. Anything else should be decoupled from the project if it’s feasible. Django makes it very easy to decouple URLs and templates and the like, and the community advocates as much decoupling as possible,

In fact, my own blog app is actually a standalone application that lives in its own directory and can be tar’d up and moved elsewhere at the drop of a hat. Django *apps* are linked into a Django *project*, which is what I think of as the “site-specific” collection of settings like database connection info, admin emails, etc. Drop an app in some directory, list it in the “INSTALLED_APPS” setting in your project’s ‘settings.py’ file, and you’re off and running.

Stay Tuned!

I hope to have RSS and Comments enabled on LLO in the next day or so, time allowing. I’m also maintaining my consulting business while I’m doing this, so time isn’t always on my side — speaking of which, I offer discounted consulting rates to work on Python projects, because I really like using Python, and now that I’m making friends with Python on the web, it looks like I might’ve finally achieved the dream of having one language that I can use to do systems development as well as web development. I don’t do much desktop GUI stuff… but who knows?

Until I get RSS and comments up, subscribe to this blog, and follow me (@bkjones) on Twitter.

LinuxLaboratory woes, Drupal -> Django?

Tuesday, July 21st, 2009

Ugh…

So, today I tried browsing to one of my sites, linuxlaboratory.org, and found a 403 “Forbidden” error. Calling support, they said it was a “billing issue”. Well, I pay my bills, and I haven’t received any new credit cards, so I’m not sure what that’s about. Further, they haven’t contacted me in any way shape or form at all in a very long time, and I’ve had the same email addresses for years now. Last time they failed to contact me, it was because they were sending all of the mail to “root@localhost” on the web server.

What’s more, the tech support guy, having determined that this wasn’t a technical but an administrative problem, transferred me to a sales person who was not there. I left a message. That was 3 hours ago. So I took matters into my own hands and changed the name server records to my webfaction account, and linuxlaboratory.org now points to an old test version of the site that uses Drupal.

It’s Over Between Us…

Drupal holds the record for the CMS that has run LinuxLaboratory the longest. Since its launch in 2001, LinuxLaboratory has used all of the major, and some of the minor open source PHP CMSes. Drupal gave me something very close to what I wanted, out of the box. Nowadays, Drupal is even nicer since they redid some of the back end APIs and attracted theme and module developers to the project. I’ve even done some coding in Drupal myself, and have to say that it really is a breeze.

But the problem is this: I’m a consultant, trainer, and author/editor. I am an experienced system admin, database admin, and infrastructure architect who makes a living solving other peoples’ problems. I really can’t afford to have something that is super high overhead to maintain running my sites. With Drupal releasing new versions with major security fixes once per month on average, and no automated update mechanism (and no built-in automated backup either), it becomes pretty cumbersome just to keep it updated.

This is in addition to my experiences trying to do e-commerce with Drupal. I tried to use one plugin, but soon found myself in dependency hell — a situation I’m not used to being in unless I’m on a command line somewhere. So, out with Drupal. I know it well and I’m sure I’ll find a use for it somewhere in my travels, but not now, and not for this.

Is Django the Future of LinuxLaboratory?

So I’m thinking of giving Django another shot. In fact, I thought I might try something new and interesting. Maybe I’ll build my Django app right in front of everyone, so that anyone who is interested can follow along, and so people can give me feedback and tips along the way. It also lets me share with people who have questions about a feature I’m implementing or something like that.

For fanboys of <insert technology here>, know this: I’m a technology whore. I consume technology like some people consume oxygen. I love technology, and I get on kicks, and every now and then, a “kick” turns into a more permanent part of my tool chest. Python is one such example. I’ve done lots with Python, but have never really made friends with it for web development. I got a webfaction account specifically because they support Python (and Django). I’ve done nothing with it. Now I think I might.

But not to worry! I own lots of domains that are sitting idle right now, and I’m considering doing a Ruby on Rails app for one of them, and I’m dying to do more with Lua. There’s only so much time!

Webfaction Django Users: Advice Hereby Solicited

So if you’re a webfaction customer using Django, please share your tips with me about the best way to deploy it. I’ve used nothing but PHP apps so far, and found that rather than use the one-click installs webfaction provides, it’s a lot easier to just choose the generic “CGI/PHP” app type and install the code myself. This allows me to, for example, install and update wordpress using SVN. Is Django a similar story, or does webfaction actually have an auto-upgrade mechanism for this? How are you keeping Django up to date?

Thanks!