Musings of an Anonymous Geek

Made with only the finest 1's and 0's

Menu
  • About
  • Search Results
Menu

Simple S3 Log Archival

Posted on June 3, 2008June 3, 2008 by bkjones

UPDATE: if anyone knows of a non-broken syntax highlighting plugin for wordpress that supports bash or some other shell syntax, let me know :-/

Apache logs, database backups, etc., on busy web sites, can get large. If you rotate logs or perform backups regularly, they can get large and numerous, and as we all know, large * numerous = expensive, or rapidly filling disk partitions, or both.

Amazon’s S3 service, along with a simple downloadable suite of tools, and a shell script or two can ease your life considerably. Here’s one way to do it:

  1. Get an Amazon Web Services account by going to the AWS website.
  2. Download the ‘aws’ command line tool from here and install it.
  3. Write a couple of shell scripts, and schedule them using cron.

Once you have your Amazon account, you’ll be able to get an access key and secret key. You can copy these to a file and aws will use them to authenticate operations against S3. The aws utility’s web site (in #2 above) has good documentation on how to get set up in a flash.

With items 1 and 2 out of the way, you’re just left with writing a shell script (or two) and scheduling them via cron. Here are some simple example scripts I used to get started (you can add more complex/site-specific stuff once you know it’s working).

The first one is just a simple log compression script that gzips the log files and moves them out of the directory where the active log files are. It has nothing to do with Amazon web services. You can use it on its own if you want:

#!/bin/bash

LOGDIR='/mnt/fs/logs/httplogs'
ARCHIVE='/mnt/fs/logs/httplogs/archive'
cd $LOGDIR
if [ $? -eq 0 ]; then
for i in `find . -maxdepth 1 -name "*_log.*" -mtime +1`; do
gzip $i
done

mv $LOGDIR/*.gz $ARCHIVE/.
else
echo "Failed to cd to log directory"
fi

Before launching this in any kind of production environment, you might want to add some more features, like checking to make sure the archive partition has enough space before trying to copy things to it and stuff like that, but this is a decent start.

The second one is a wrapper around the aws ‘s3put’ command, and it moves stuff from the archive location to S3. It checks a return code, and then if things went ok, it deletes the local gzip files.

#!/bin/bash

cd /mnt/fs/logs/httplogs/archive
for i in `ls *.gz`; do
s3put addthis-logs/ $i
if [ $? -eq 0 ]; then
echo "Moved $i to s3"
rm -f $i
continue
else
echo "Failed to move $i to s3... Continuing"
fi
done

I wish there was a way in aws to check for the existence of an object in a bucket without it trying to cat the file to stdout, but I don’t think there is. This would be a more reliable check than just checking the return code. I’ll work on that at some point.

Scheduling all of this in cron is an exercise for the user. I purposely created two scripts to do this work, so I could run the compression script every day, but the archival script once every week or something. You could also write a third script that checks your disk space in your log partition and runs either or both of these other scripts if it gets too high.

I used ‘aws’ because it was the first tool I found, by the way. I have only recently found ‘boto‘, a Python-based utility that looks like it’s probably the equivalent of the Perl-based ‘aws’. I’m happy to have found that and look forward to giving it a shot!

Share this:

  • http://standalone-sysadmin.blogspot.com Matt Simmons

    This is interesting.

    What sort of things do you use AWS for? Are you using S3 and EC2?

  • m0j0

    Hi Matt,

    I use S3 for log archival, and soon I’ll be moving database backups (old ones) to S3 as well. Other things might migrate there over time as I become better acquainted with the service. I’m still not confident enough to host anything “live” there, but I know people are doing that.

    I started out by diving head first into Hadoop, S3, EC2… the works. What I found is that it requires you to really immerse yourself in the ways of AWS. It’s a great service, but there’s a lot of stuff that isn’t done – mostly in the area of administrative tools. I also had a couple of conceptual problems relating to failover/redundancy/architecting-for-failure within the EC2 service environment, and I had so many other things on my plate (still do, unfortunately) that this has been put on the back burner for the time being.

    Things are progressing rapidly in EC2-land. Soon we’ll have persistent storage, which actually solves a number of the other issues I had with EC2 (via hacks that would rely on persistent storage), and IP addresses that are at least somewhat predictable. People are also writing better tools to manage all of this stuff, and there’s more reading material about how to make running services in that environment a less sanity-depleting experience. 🙂

  • http://standalone-sysadmin.blogspot.com Matt Simmons

    Cool, thanks for the rundown on it!

    I’m going to keep an eye on this. We recently invested in 20 blades for our primary and secondary sites, and a 50k SAN, so I don’t think we’d use it this round, but in the future I can see where this would be very handy.

    How long have you been using it?

  • http://bzimmer.ziclix.com Brian Zimmer

    For the syntax highlighting, you might want to try Pygments, which claims to support bash. I wrote about my use of it here though I don’t use it as a plugin.

  • http://weblog.bluepenguin.us Paul Holbrook

    It’s a clever idea, but how cost effective is it? At .15/GB/month, 100 gig of S3 storage costs you $180 a year.

    I guess it depends on your alternatives. Certainly commodity hard drives are far cheaper, but enterprise level SAN/NAS storage is much more.

Contact Me

You should follow me on Twitter

Recent Posts

  • On Keeping A Journal and Journaling
  • What Geeks Could Learn From Working In Restaurants
  • What I’ve Been Up To
  • PyCon Talk Proposals: All You Need to Know And More
  • Sending Alerts With Graphite Graphs From Nagios
  • The Python User Group in Princeton (PUG-IP): 6 months in
  • The Happy Idiot
  • pyrabbit Makes Testing and Managing RabbitMQ Easy
  • Shhh… I’m Hunting Talks
  • Thoughts on Python and Python Cookbook Recipes to Whet Your Appetite

Categories

  • Apple
  • Big Ideas
  • Books
  • CodeKata
  • Database
  • Django
  • Freelancing
  • Hacks
  • journaling
  • Leadership
  • Linux
  • LinuxLaboratory
  • Loghetti
  • Me stuff
  • Other Cool Blogs
  • PHP
  • Productivity
  • Python
  • PyTPMOTW
  • Ruby
  • Scripting
  • Sysadmin
  • Technology
  • Testing
  • Uncategorized
  • Web Services
  • Woodworking

Archives

  • January 2014
  • September 2012
  • August 2012
  • February 2012
  • November 2011
  • October 2011
  • June 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • September 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
© 2019 Musings of an Anonymous Geek | WordPress Theme by Superbthemes