Archive for the ‘LinuxLaboratory’ Category

Stop Doing Things That Don’t Work (a.k.a: Excel and Virtual Private Servers are Evil)

Wednesday, October 29th, 2008

Note that I’m talking about using these tools in some kind of professional way, and more specifically, I’m talking about using Excel as a database, and using VPS hosting to host “professional” web sites. By “professional”, I mean something other than your personal blog, picture gallery, or other relatively inconsequential site.

Excel is not a database

Here’s the thing: Excel isn’t a database. Most people who don’t work in IT don’t seem to understand this, and they’re deathly afraid to actually communicate with anyone in IT, so they take matters into their own hands, and create problems so big that IT is forced to get involved, because at some point this spreadsheet becomes “critical” to some business function. Then IT gets even more bitter toward the non-IT folk, validating some of the reasons the non-IT folk went that route in the first place, and virtually guaranteeing that they won’t come to the IT group next time either.

So, if you don’t work in IT and are not a geek, know this: Excel is not a database. Excel is not meant to manage data on a long-term basis. For everything you can do with Excel, there is almost certainly a better tool for the job. This isn’t to say that Excel is good for *nothing*, just that it’s generally not good in places where data needs to be managed over the longer term, shared with others, and relied upon for day-to-day operations of a business or department.

Find someone in IT who seems nice and “deals with databases”, and ask them what their thoughts are on the topic. Then tell them the *actual problem you’re trying to solve*, and ask how they would approach it. You’re not likely to hear “Excel” in the reply unless Excel is so rampant in your company that it’s become a corporate standard for creating data fiefdoms, which would be bad.

A VPS is Not “Professional Grade”. Ignore Adverts to the Contrary

No, really – I mean it. I’ve done plenty of consulting for companies who need some kind of fire put out for one of their web sites. Not long into the conversation I learn (for about 50% of the calls I get) that the site is externally hosted on a VPS. Occasionally I get people whose sites are, or are supposed to be, hosted on dedicated servers, but the actual VPS/dedicated server isn’t really the whole issue. The issue is with how these things are configured, and your ability to do what you need with them.

Marketing for VPS and dedicated server hosting often say “full root access” somewhere in the list of features. There are also specs like the CPU speed, amount of RAM, and bandwidth limits. All of these come together to give the unwitting customer the notion that they’re getting full root access to some kind of behemouth server with all kinds of resources. However, things go downhill when you see things like cPanel, Plesk, or anything else that looks like “easy management through web-based administrative interface”. Again, this is probably fine for something that gets 100 hits per month or so and isn’t critical. The minute you can attach a cost to the problems that can arise with your site, you need to ditch these hosting plans.

Why? There are numerous reasons, but I’ll start with three:

  1. There’s typically no failover or “high availability”: if one machine goes down, or one VPS on the same hardware goes nuts, you’ve just ceased to exist on the internet at all.
  2. The CPU and RAM advertised is used mostly by the bloated software used to automate the management and monitoring of the systems (in other words, it’s used by your hosting provider, not your own application).
  3. The system configurations I’ve seen in these environments borders on retarded, and since the end user is managing all of this through a web interface, the only folks left to blame are the providers. So when you have problems, they’re guaranteed to be extra-challenging to solve.

What kinds of system configuration issues? Well, how about every service turned on, every port open (and not filtered) by default? How about downright broken service configurations, ranging from named.conf (DNS) configs specifying features that *can’t* work as configured, to crippled package management tools that disallow package modifications because doing so would break the monitoring/management tools, to php.ini files that turn on displayErrors and turn *off* log_errors. In general, logging configurations are poor or worse, making problem-solving an uphill battle. Every time I log into a VPS I am typically shocked and appalled at what I find. Even if it’s $5 a month, it’s not worth it.

Think about it: if you have a VPS and you have database corruption, what happens? You call support, who will probably just confirm or deny that actions performed by them or their automated routines had anything to do with the corruption (if they were forced to uncleanly reboot the machine, for example, that might explain things). Usually, they’ll say they don’t have any record of any events on the server that might be an issue, and you’ll need to fix it yourself (that’s what you wanted “full root access” for in the first place, right?).

So, you get a system or database guy to look into things. He’ll find that there are no logs, broken configurations, and when he tries to make a change, it’s either overwritten by these wacky automated management routines, or it breaks some part of the web-based management interface. He’ll also find that, while your web site uses about 128MB of the 512MB of available RAM, the host is running software that takes up double that amount of RAM. Wow, what a deal you got!

All of these issues, by the way, can also occur on dedicated servers, but what sets VPS services apart is the performance: it is, at the very best, unpredictable, and often bad. Some hosts try to market their way around this by charging you more money for “low-density” VPS “solutions”. Don’t buy it. It’s not a density issue. Even if you only share the hardware that runs your VPS with *one* other VPS, if that other VPS goes crazy and starts performing huge amounts of disk reads and writes, your site, even if there are only 3 people looking at it, is going to be slow.

The solution? Well, evaluate whether or not you really need the control a VPS gives you. If you’re just running WordPress, a simple CMS, or a brochure web site, you almost certainly don’t need a VPS. Get a web hosting plan. They often offer one-click installations of wordpress and CMSes like PostNuke, PHP-Nuke, Joomla, Drupal, etc, along with phpMyAdmin for doing database operations. LinuxLaboratory.org runs on Drupal and MySQL, and houses a bunch of articles I’ve written about Linux, System/DB Administration, etc., that I’ve written over the years. It also presents a feed of the content on this blog, and it’s been running on a simple, cheap, web hosting plan for probably 7 or 8 years now. My uptime is better than the sites of friends of mine who decided they needed the control of a VPS. Same goes for this blog (though it’s a different provider). Heck, my beer blog runs on a *free* web hosting solution at DreamHost. It’s not super fast, but aside from that it serves its purpose well, and they have one-click installations for just about everything.

If you need to launch some kind of site that requires things not offered by a web hosting plan, then chances are you’re developing the site, or have some budget or staff for helping you setup/manage/troubleshoot the services you’ll run there. Check out Amazon EC2 and Google AppEngine, and look into dedicated hosting to see if any of those meet your needs.

If you have an IT department, you could, of course, try to work with them on a solution. This is almost always the best solution over the long haul.

Why should I pay for this AWS design decision?

Monday, June 23rd, 2008

I was writing a utility in Python (using boto) to test/play with Amazon’s SQS service. As boto isn’t particularly well documented where SQS specifically is concerned, I also plan to post some examples (either here or on Linuxlaboratory.org, or both). When I had some trouble getting a message that was sent to a queue, I went to the Amazon documentation, and found this little gem in the Amazon Web Services FAQ

I am sure that my queue has messages, but a call to ReceiveMessage returned none. What could be the problem?

Due to the distributed nature of the queue, a weighted random set of machines is sampled on a ReceiveMessage call. That means only the messages on the sampled machines are returned. If the number of messages in the queue is small (less than 1000), it is likely you will get fewer messages than you requested. If the number of messages in the queue is extremely small, you might not receive any messages in a particular ReceiveMessage response. Your application should be prepared to poll the queue until a message is received. Note that with the 2008-01-01 version of Amazon SQS, you’re charged for each request you make, so set your polling frequency with that in mind.

So… if you were planning to decouple application components using SQS using an ‘eventual consistency’ model, keep in mind that they’re using the same model, and that they’re charging you for the privilege of eventually getting the messages you’ve already paid to put there, but aren’t necessarily available at any given point in time. I personally think this is a little goofy, and wrong.

If I put a message in a queue, I should be charged for actually getting the message. I should *not* be charged for checking to see if Amazon’s internal workings have made my messages available to me yet.

A simple nanny script in Python

Saturday, March 29th, 2008

I have a support issue with a provider of mine, but was able to reverse engineer the problem and put in a stop-gap measure to keep it from ruining my weekend. The issue is a misconfigured daemon supplied by the provider, and occasionally, this daemon just goes away. I don’t know much about the daemon, but the underlying system is standard CentOS, so what I really needed is a way to detect if the daemon failed, and then restart it if that’s the case. The script that does this exists in every shop I’ve ever worked in, and is traditionally called a “nanny script”.

There are actually some nice looking projects that deal with this issue and others, but I didn’t really have time to read all the docs (yet), and I wasn’t sure it wasn’t overkill — but it might be nice to have a daemon instead of a script running from cron.

Anyway, I was shocked that I was unable to find a simple nanny script out on the web – in *any* language. Maybe my google-fu is out of whack. So I went ahead and wrote one up *very* quickly using Python. If you need a script to run every minute or few out of cron and restart a misbehaving daemon if it’s not running, feel free to use my nanny script.

LinuxLaboratory Overhaul and Relaunch Complete

Sunday, December 24th, 2006

So it’s all done. LinuxLaboratory.org now runs on Drupal, it runs on a new hosting service, and so far I’m very happy with both. 

The hosting service is a little more modern than my old one, is slightly faster, gives me access to far more resources than I’m likely to ever use, and costs about 70% less. 

Drupal is a CMS. It’s not a wiki, it’s not a blog, it’s there mainly to manage “content”. It doesn’t have a wysiwyg editor, but that’s no big deal, and I wound up having to add two modules, which I really didn’t want to have to do, but it handles things pretty nicely, and it was pretty much a breeze to get going. Moving over content was no problem – just a little time consuming to fix the formatting. Getting a download section in place was a little more of a headache, but once I got the hang of it it was nicer than anything else I’ve used. 

In the end, I think Drupal’s ease-of-use (at least, the way I’m using it is easy) will be a catalyst to doing more with the site. 99% of the content is handled using the Drupal “book” module, so I don’t have to mess with “taxonomies”, for example. I don’t even know what that really is in the context of Drupal, and I’m happy to stay stupid in that regard. 

Anyway, it’s up and running, so go have a look and let me know what you think about it. 

New LinuxLaboratory.org Site Live

Friday, December 22nd, 2006

By now, DNS has propagated, and everyone should be able to see the new drupal-ized LinuxLaboratory.org (LLO for short). http://www.linuxlaboratory.org

Just about all of the old content that is still relevant has been moved over, and some new content, and a good number of useful downloads have also been added.

I’ve also added another site administrator named Chris St. Pierre. I’ve known Chris online for some time, and recently we got to meet in person at LISA ’06, where we co-hosted the Fedora Directory Server BoF. We’ve collaborated a bit on some code in the hopes of eventually making the FDS GUI obsolete, and that code is available on the site. It should be noted that the code isn’t FDS-specific – it’s just the catalyst for its creation :-)

Well, let me know what you think, and happy holidays!!

More CMS Requirements Than I Thought

Sunday, December 10th, 2006

I thought my needs were simple. When I started LinuxLaboratory.org, it was full of features. User forums, news categories, icons and emoticons everywhere, downloads, interviews… it was really all-singing and all-dancing. It was also too much for one guy to manage.

I decided to trim the fat and get back to basics. LinuxLaboratory.org started as a place for me to keep notes for myself. Others found the notes useful, and I was asked to write an article or two for other sites. Then I started writing my notes in the form of articles. Then I started writing LOTS of articles all the time. I also wrote some code, and saw no reason to keep it to myself. So, what I need is a place to keep articles, and a place to keep downloads that others can get to.

My requirements? Well, I need a CMS that allows me to create navigation that is very article-centric. I want users who come to the site to see the categories of articles so they can find what they want quickly. I want users to click on an article category and see the article titles available in that section. I also want a link on the front page to a download section where people can then see a list of available downloads.

I don’t want much more than that. I don’t want to learn about inane taxonomies, I don’t want things listed chronologically, I don’t want a framework that allows a million people to contribute. I don’t want a wiki, I don’t want a blog, I don’t want a news portal.

What I want, I think, is to be able to structure content more or less like a book is laid out… online. A single-user site with content broken down by chapter and subchapter. When the content contains code, I’d like to make it available, either inline or via a download. As far as I can tell, this does not exist. Let me explain:

I’ve tried PHPX, drupal, dokuwiki, mediawiki, and wordpress, all within the last year or so. Looking back over 5 years, I’ve tried just about everything else as well. XOOPS, PHP-Nuke, Postnuke, Mambo, and the list goes on.

PHPX was just plain flaky, but was damn near perfect in terms of what I wanted to do with my content. The numerous bugs made me leave it.

Drupal is really nice too, and I’m still testing it, but the article formatting isn’t wonderful, and I’ve found that if I insert PHP code inline, if I use a ‘pre’ tag to insert it, the PHP gets parsed. If I use the ‘code’ tag to insert it, I lose any notion of indentation. This is no good, but I’m still searching for a solution because otherwise drupal seems kinda nice so far.

Dokuwiki is nice, too. I really like that you can have syntax highlighted code inline in your articles. I *don’t* like that you have to pick a string representation of your article that is not the title of the article. So, in other words, instead of seeing Linux->Scripting->More Power With Bash Getopts, I’m forced to live with Linux->Scripting->bash_getopts. It also wasn’t obvious to me how you’d link to a download without using an absolute “External” link, which, in the context of something that already does so much, seems like a hack.

Mediawiki is what LLO currently runs on, and I’ve learned over the past year that doing downloads and structuring things the way I want them in mediawiki also involves hacks.

WordPress is nice, but again, no obvious way to do downloads cleanly, and chronology in my content is really pretty irrellevant. I’m happy to date my articles, but the articles I’m posting relate to eachother in ways that have nothing to do with their creation date. PHP and Shell articles written two years apart should still appear next to eachother in the “scripting” section.

In the end, my recommendation to others is this: if you’re not hosting a blog, don’t use blog software. Not hosting a wiki? Don’t use wiki software. Not hosting a news portal? Don’t use news portal software.

Also, before I get flamed, note that I’m aware that I can probably load my site down with plugins to accomplish what I want. However, I’ve been doing this for a while, and I know that, while using a plugin will work for a while, there is also often a lag between the release of a new version of the base software and the release of the plugin for the new version of the base software.

If anyone has a clue about what I might use to accomplish my goal of basically providing categorized articles online with as little bloat as possible, let me know.

Technorati Tags: , , , , , , ,

Yet another LinuxLaboratory overhaul in the works.

Friday, November 17th, 2006

Well, it’s that time again. Time to suck it up and come to terms with yet another “conent management solution” that doesn’t fit the bill as far as Linuxlaboratory.org is concerned.

The latest system is MediaWiki, and it’s not working out. There’s about a million things built into MediaWiki that I don’t use, and then some of the simple things I do want to use are either hacks or don’t work right, or both.

For example, MediaWiki, to my knowledge, doesn’t have *native* support for providing downloads through your site. You can do it using a hack that utilizes the “Images” tag, and I’ve done that on LinuxLaboratory, but recently, and without warning, the wiki markup stopped honoring my request to hide the hack in the actual title of the download, so now my downloads all say “Images:Downloadname” instead of just “Downloadname” as my wiki markup indicates.

Anyway, MediaWiki doesn’t really aim to satisfy the needs of sites like LinuxLaboratory. I just tried to repurpose it, and it hasn’t worked as well as I would’ve liked. I still actually use it for other projects that are more wiki-like, and it’s great, but not for this site.

So what’s next? I’m not really sure. There seems to be an unclaimed niche in content management. I can’t seem to find a solution that doesn’t consist of either some arbitrarily convoluted framework, or a bunch of hacked PHP scripts with no real flexibility. I don’t want to run a news site, I don’t want to run a blog. I run, primarily, a HOWTO-style documentation website that also provides downloads of various bits of code.

I’ve looked into handling this with WordPress, but there’s not a really good way to handle category navigation, or downloads, that’s *native*.

Why am I obsessed about native features? Well, because I’ve been shot in the foot in the past by installing whiz-bang extensions written for one version of the CMS-of-the-day, only to find they’re unusable when the CMS is upgraded, and they’re in no big rush to update their code.

I’ll be testing out a few systems, now that I’ve basically given up on wordpress. Open source projects use systems to host their software downloads and documentation, so I’ll look into some project home pages and see what they use.