Archive for the ‘Ruby’ Category

Make old svn revision the current revision

Thursday, May 29th, 2008

I ran across an issue that my google-foo has had some trouble handling. Maybe what I did is the only way to do it, in which case maybe this will help someone in need… but I rather like to think that someone here will have a much nicer solution.

I use Subversion at most of the clients I work for, for most of the projects I work on. It’s even used in the production/editorial process at both Python Magazine and php|architect Magazine. For my own work, I’m still using it personally mainly because I haven’t had the time to really get a good grip on Git yet.

Though I’ve been using svn almost since it was created, I can’t find a clean way to revert to an earlier revision of a file, and then make that version of the file the new “current” version of the file. At first I thought I could just make a trivial change to the file and commit, but you can’t commit if the file is out of date with the repository (that’s why you’re supposed to run ‘svn up’ before committing – duh). I also didn’t see an option to any of the svn commands that looked like it would work, but maybe I missed something (do tell!).

What I wound up doing was moving the local copy of the file sideways, svn up’ing the file back down, moving the good copy back over (overwriting the one I just svn up’d), and committing. Nasty. It looks like this:

$ mv myfile.php myfile.php.bak
$ svn up myfile.php
$ mv myfile.php.bak myfile.php
$ svn commit myfile.php -m "All cleaned up"

It works, but it just seems like something that should be built into svn somehow. Those with sufficient clue are hereby solicited to share it :-)

Social Media, The Future of News, and Data Mining

Friday, May 16th, 2008

I went to a very good panel discussion yesterday hosted by the Center for Information Technology Policy at Princeton University. There has been a conference going on there that covers a lot of the overlap between technology, law, and journalism, and the panel discussion yesterday, Data Mining, Visualization, and Interactivity was even more enlightening than I had anticipated.

The panel members included Matt Hurst, of Microsoft Live Labs, Kevin Anderson, blog editor for The Guardian, and David Blei, a professor at the Computer Science Dept., Princeton University. This made for a very lively discussion, covering a wide range of perspectives about social media, “what is news?”, how technology is changing how people interact with information (including news), how the news game is changing as a result (which was far more fascinating than it sounds), and how this unfathomably enormous stream of bits, enabled by lots of open APIs, feeds, and other data streams can be managed, mined, reduced, and presented in some value-added way (part of the value being the sheer reduction in noise).

Cool Tools for Finding News

Some of the tools presented by the panelists were new to me, and aside from being great tools for bloggers and other content publishers, there are some excellent examples of how to make effective use of the data you have access to through APIs like the Digg API.


This was presented by Matt Hurst. It’s is pretty neat – it’s a tool that essentially charts blog buzz of a given phrase over time, and it even lets you compare multiple phrases, which is really interesting as well. Check it out here.

I’d like to know more about how it derives the metrics, but in doing a couple of quick comparisons using the tool, it seems to line up to some degree with simple comparisons of the number of search results for different phrases on sites like technorati and bloglines. Interestingly, even though there appears to be lots more data available at Technorati, in my very limited experimenting, the percent difference between search results for any two phrases appears to be similar, indicating that bloglines may be a representative sampling of technorati data. More experimentation, of course, would be needed to lend any credibility whatsoever to that claim. It’s probably irrelevant, because you can’t ask either service for any kind of historical data regarding search results :)


This has the potential to be really interesting. Right now, it lets you pick from several different terms, like “love”, “wish”, “think” and “feel”, and after clicking one of those, it’ll start producing a constantly updating stream of twitters that contain those words. If this experiment is successful, I would imagine they’d eventually enable the same service for arbitrary keywords, which would be really powerful, and quite a lot of fun!


Oh how boring my life according to twitter is. I’m still in the schizophrenic stage of settling on a live ‘update your friends on what you’re doing whether they care or not’ services. Facebook, myspace, twitter, jaiku… there are too many. I’m trying out the imified route now to consolidate all the cruft. According to tweetwheel, there are more places to update my status at any given moment than there are people who give a damn what my status is.

Anyway, tweetwheel shows how you’re connected to people through twitter. If you have lots of followers and follow lots of people, the wheel is really exciting to look at, as displayed by Kevin Anderson, who has a much more “robust” wheel than me — it’s actually interesting to look at. At some point I’d like to see this idea expanded to cover the other services like Facebook and even LinkedIn.

Digg Labs

You have to go to the Digg Labs site and see what people are doing with the Digg API. There are too many awesome utlities to cover them all here. It almost makes me wish I did fancy Flash UI stuff instead of back end data mining and infrastructure administration.

At a higher level…

Most of the discussion about social media seems to be about measuring buzz created by bloggers (at least where news/content publishing is concerned). However, although things have shifted dramatically in a ‘consumers are producers’ direction, causing people to start rethinking the definition of news, this shift is caused as much by consumers who are still *only* consuming as anyone else, and I didn’t see much in the way of tools that measure the interest of those people in any meaningful way. Perhaps the consensus is that the bloggers are a representative sampling of the wider internet readership? I don’t know. I would disagree with that if it were the case.

I work for, which seeks to provide publishers of news and all kinds of other content with statistics that help them figure out not just what pages people happen to be landing on, but which ones they have elected to take a greater interest in, either by emailing it to a friend, adding it to their favorites, or posting it to digg, delicious, or some other service. Maybe some day there will be an AddThis API that’ll let you easily do even more interesting things with social media.

Can’t find the Ruby Kool-aid

Saturday, April 19th, 2008

I picked up the Apress book about Ruby for Systems Administrators, because I plan on learning enough about Ruby to make it a viable tool that I can use if I have to use it at some point. I still plan on doing that, but in reviewing the first couple of chapters, I don’t think I’m going to like Ruby very much, and I think I’ll have to be dragged kicking and screaming to use it. Maybe as I gain some practical experience with it this feeling will subside.

I should say that I did all of my sysadmin programming in shell and perl for many years. I still use them somewhat regularly, if only to maintain existing code or to write code for clients who require Perl, or to work on systems that don’t have Python installed (yes, these exist. Basically, unless you’re using Linux, Python is not “just there”). If I have a choice or some say in the matter, I’ll tend toward Python if I’m writing new code. I’m pretty sure that all of the non-web code that I’m writing for AddThis, for example, is Python (web code is in PHP, and I do database development using straight-up SQL, because we’re using MySQL – if I were using PostgreSQL, there’s a chance I might use Python for that as well).

When I started using Python, I also looked at Ruby. It was one of those periods of time where I was pretty much out looking for a language to add to my repertoire. I had experience with Java, C/C++, PHP, Perl… enough languages that at this point I knew exactly what I wanted in a language: I wanted a language that I could use no matter what I was doing. Of course, I do mostly systems coding, but I wanted a language that would be enjoyable enough to use, and agile enough, and capable enough, and available on enough platforms, and <long list here> that I could use it for web and perhaps even GUI scripting as well (I tend to avoid scripting GUIs in any language because I just think going straight to the web skips an oft unnecessary step in the evolutionary cycle).

Turns out, Python won because it happened to be in heavy use on a couple of projects I was loosely involved in at my job at the time. However, I’ll state that I volunteered to be involved in them so I could get a better grasp of Python, and I’m glad I did. My feeling at that time was that Ruby looked too much like Perl to me, and I was really making an effort to move away from Perl, because after 10 years of coding in a language, you should not still feel like a complete neophyte every time you have to use it.

After more reading about Ruby, I’ve discovered that, due to the fact that Ruby supports things that don’t exist in Perl, yet tries to adhere to a Perl-ish notion that symbols are good visual indicators of various things (though the same symbols mean completely different things between the two), it actually looks even more cryptic to me than Perl does.

This is completely a “doesn’t fit my brain” thing. I’m sure Ruby is a fine language, and if I made regular enough use of it, I’d probably get used to parsing all of the symbols (though the same could be said for Perl and even shell as well). Python has some weirdness to it as well, which also took some getting used to. Unfortunately, I don’t really have a project to use Ruby on right now, and nothing forcing me to use it, and no existing projects that would be easy for me to involve myself in. I guess I’ll just have to find some sysadmin task that needs doing and see how Ruby works out. My primary language, though, is likely to remain Python, as I find it about 60 million times easier to read than Perl, and 100 million times easier to read than Ruby. I learn a lot by reading code, so I guess the ability to parse it readily is of primary importance to me in picking up a new language.

Vim messing up indentation on your pasted code?

Friday, April 4th, 2008

I’ve had this issue forever. I want to paste in some code from somewhere else into a Vim session on some random box, but I have to remember to do “:set nocindent noautoindent nosmartindent noreallypleasedontindent” all the time. Well, I finally had some time to google for an actual solution to the issue (I knew I couldn’t be the only person in the world with this issue) and I found one that is sooooo easy. Ready? Next time you wanna paste in some code, run this first

:set paste

That’s it. Now paste in your code, and after the angels stop singing in your brain, you can get back your normal indentation settings with:

:set nopaste

As you might expect. :-)

Recovering deleted files from an svn repository

Thursday, October 25th, 2007

I know I’m going to forget how to do this, because I only ever need to do it once a year or something, so I’ll put it here for safe keeping:

To recover a file from svn that you deleted from your local repository, it’s first necessary to get the proper name of the file, and the revision of the repository it last existed in. To do that (assuming you don’t know, because if you do you have bigger issues), you go to the directory it was in (or as close as you can get to the directory it was in) and run:

> svn log --verbose

You should be able to find the file you’re looking for and the revision you need in the output of that command. Assuming your file’s name is ‘file.txt’ and it was in revision 250, you run the following to recover it:

> svn up -r 250 file.txt

Done. It’s there waiting for you. Enjoy. I had been fumbling around with ‘svn co’ syntax until a digital buddy of mine corrected me. Thanks, Nivex!

Freebase: Your database is ready!

Sunday, May 13th, 2007

This is going to be really frickin’ cool. There’s just no other way to put it. Maybe I’m a little too much of a data geek, because I can’t seem to sit still since receiving my email letting me know that Freebase is now in alpha, and the account I requested months ago can now be activated. I logged and immediately started poking around. I’ve been doing that for about 48 hours straight now.

What is ‘Freebase’?

Well, the short answer is that Freebase is a public domain relational database maintained by the community. If this sounds like Wikipedia, don’t get too attached to that comparison. It’s true that Wikipedia is also maintained by its users, but that’s where the similarities end. You see, while Wikipedia stores information in a way that makes it attractive and easy for humans to find things, Freebase provides the kind of structure and relational characteristics that make it useful to application developers (programmatic access). It provides a relational database, which is typically used by programs, instead of an encyclopedia, which is used by people.

If you’re a DBA, your first thought might be that these are people who are trying to take your job. Not so. This is in no way suitable for internal, private, corporate, proprietary data. In fact, I don’t believe it’s even allowed. What it *is* good for is applications that can make use of publicly available and/or publicly maintained data. For example, a sample application called “Concierge” allows users to browse restaurants in their area by first telling the application the area they live in, and then the type of restaurant they’re looking for. The data about the restaurants is all stored in the publicly maintained Freebase database, and Concierge also provides an interface for users to add new restaurants, which adds the data back to Freebase.

Me as a working example of a typical Freebase geek

I myself am fleshing out the Freebase “beer”, “beer style”, and “beer style category” types, in the hopes that I can provide a web interface that allows beer enthusiasts and brewers alike to know more about beer, and helps them to develop recipes for their own beer. I’m using the BJCP (Beer Judge Certification Program) published beer style guidelines to flesh things out, and then other people can come in and start associating beers with styles and breweries and all kinds of other characteristics. Having the beer style definitions held in a public place means that, as the style guidelines evolve, people who care about such things are free to make updates to the data in Freebase. This means the data used by my application is always up to date, and I never have to push out updates to users just to update their local copy of this data.

It may also mean that I can afford to make the application available to large numbers of people for free, since I won’t have to find a hosting plan that lets me house however many gigabytes of data this thing grows to. I can eventually work in all of the properties of different hop varieties, different grain characteristics, yeast attenuation rates, water profiles of different beer brewing regions, etc. Further, I don’t have to be an expert on every facet of beer, because people who care about, say, yeast attenuation are probably going to populate that data for me anyway, whether its specifically for my application, or one of their own.

The Future of Freebase (Pre-“Total World Domination”)

Aside from simple publicly maintained data, I think there are other implications of this model. For example, with all kinds of applications using Freebase as the back end database, and considering that users can have their own “private domain” data type definitions, it makes sense for applications to use the users’ Freebase credentials to maintain application preferences data using that domain. This would seem to make Freebase a contender for a de facto standard OpenID or CAS portal.

If the model is Earth-shaking, it stands in contrast to the current state of the actual user interface. Oh, I’m getting by just fine, but that’s more in spite of the interface than because of it. Some of the ajax-y features hurt as often as they help, navigation needs a little improvement, and there’s no way to add massive amounts of data quickly that I’ve found yet without writing code.

Further, it’s still unclear to me how they plan to foster cooperation between people who want to relate different data types. For example, I quite naturally want to list, for each beer, the brewery that makes the beer. Someone else has already created a “brewery” type, and the community has done a darn good job at fleshing out the data for that type. However, when you go and look at a brewery definition, there is no listing of the beers produced by that brewery. Freebase certainly supports the idea of a “reciprocal link” that would cause beers to show up under “brewery” entries as people add beer definitions and fill in the “brewery” property of the beer. However, there’s no clear rules on how to get this reciprocal link to happen if you’re not the creator of all of the types involved.

What’s more, I’m not the only person who has created a “beer” type. Which one should the “brewery” type administrator link to? Well, this wouldn’t be as big a problem if I were allowed to add a property to an existing beer type! Then I wouldn’t have to create a competing type at all! Currently, this is not allowed. I cannot go redefining the properties associated with a type that’s maintained by someone else. As a result, in order to support properties of beer that brewers and enthusiasts care about, I have to strike out on my own and hope that in the long run, my “beer” type becomes “the” beer type.

This should really all be opened up, and people should be allowed to add properties that are submitted for approval by the type administrator. Reciprocal links should be put in a “pending” state, or maybe even a “probationary” state. These are features that would encourage more interaction between users who care about the same data, and foster a community around the data that community cares about.

I’m sure there are plenty of other things to think about as well. For example, will Freebase let me upload the Briess malt profiles, published by one of the biggest maltsters in the US? Briess may have a problem with that – but how will Freebase know without receiving a cease and desist from some friendly neighborhood lawyers? Then there are technical and financial details. Presumably, they’ll either charge applications that use Freebase for commercial gain, or they’ll have to charge for some higher service level in order to guarantee that data will be available for applications to use.

This is not a simple service. I’ll say this though: I wish I could buy stock in Freebase, if only to cash out when they are inevitably purchased by Google.

Technorati Tags: , , , , , , , , ,

Social Bookmarks:

Quick CVS Cheat Sheet

Thursday, April 12th, 2007

Actually, I’d love to say that I’ve moved completely to using git instead of CVS, but the truth of the matter is that, for a recent project where I’m just trying to consolidate a whole bunch of admin scripts, organize them under one (managed) roof, and (most importantly) get a bunch of admins on board using it, CVS is really probably the way to go. Besides, there’s no branching and merging going on, all of the development is internal, and for simple stuff like this, CVS is really just fine.

The one thing I used to hate about CVS is that it seemed like every time I switched gears from doing some project not involving it to one that used it heavily, I would have to read the whole manual all over again. One thing that helps me remember things is writing about them, so here’s a cheat sheet full of things that I do either regularly, or occasionally enough to be annoying 😉

For the duration of this post, we’ll work with a project called ‘foobrew’, which lives on a remote CVS server. This isn’t meant to be a total n00b intro to CVS, so I’ll assume you know the basics like how to set your CVS_RSH environment variable, for example.

Checking out a repository, or part of one

If you just want to check out the entire project from a remote CVS server, you’d simply run this command:

cvs -d foobrew

This will download every single file in the source tree into a directory under the one you ran this command from called ‘foobrew’. One of the projects I’ve imported into CVS has various modules that are worked on by different people. If someone only wants to write code for one module, they can easily check out only one module from the repository. For example, to get the ‘malt’ module from the ‘foobrew’ repository, you could run this command:

cvs -d malt

Of course, maybe the module hasn’t yet been defined, but you know that there’s a directory called ‘malt’ in your repository, and that’s what you’re interested in. Well, in that case, you just alter the above command a bit:

cvs -d foobrew/malt

Both of those commands result in a directory being created locally called ‘malt’.

Adding to a repository with ‘cvs add’… or not

You don’t *have* to use ‘cvs add’ to add stuff to a repository, but sometimes it’s the easiest way to do what you need. If you have already checked out the project, and you want to add a file to the project, it’s not enough to just copy the file into the right directory. You have to tell the CVS server that you want the file to be managed using CVS, using ‘cvs add’, and then you have to actually check the file in using ‘cvs commit’. So, once you’ve copied the file into place, you would do this:

cvs add README.txt

Followed by

cvs commit README.txt

If the file you want to add is a binary file, only the syntax of the ‘cvs add’ command changes:

cvs add -kb somefile.bin

Using ‘-kb’ tells the CVS server to treat this a bit differently by *not* putting identification strings in the file that could render the binary file un-runnable the next time you check it out of the repository.

The above assume that you’ve checked out the repository, which you might not want to do. For example, if you have a directory full of code you’d like to add to an existing repository, you can add it without first doing a checkout like this:

cvs -d import foobrew/mycode bkj start

This will create the ‘mycode’ directory under our top-level project directory on the remote server. The ‘bkj’ and ‘start’ arguments are a vendor tag and release tag, respectively, and are required whether you make use of them or not. The use of these tags is out of the scope of this document, but I’ll write about it in another one later :-) In general, if you need to use these tags, start looking at using git ;-P

Another option that might be easy enough is to just check out the top level directory of the project, and then use ‘cvs add’ to add your files. If you only want to check out the top level directory of the project, run this command:

cvs -d checkout -l foobrew

The ‘modules’ file

If you want to find out what modules are available, or you want to create a module, you can check out the ‘CVSROOT/modules’ file from any CVS repository, and then inspect it or edit it. Since you’d presumably inspect it before editing it, I’ll show here how to create a simple module:

  • cvs checkout CVSROOT/modules
  • cd CVSROOT
  • open the ‘modules’ file, and add a line like this to define a module called ‘hops’
    • hops foobrew/hops
  • save the file, and run cvs commit modules to commit the change.
  • cd ..
  • run cvs release -d CVSROOT to release the CVSROOT directory and remove it from your working copy (since it isn’t of general use on a day-to-day basis).

Overriding Credentials

One nice thing about CVS is that you can enforce accountability with it. You can give everyone an account on the CVS server and see who made what changes when. On one project, the code needs to be checked out to a directory where only root has write access. So create a read-only user on the server for that user so they can check out the code. But then what happens if I discover a problem in the working copy of the code? Well, generally you’d just make the edit in place and run ‘cvs commit’, but that’s not going to work in this case because by default, CVS uses the credentials that were used to check out the code, which was done in this case by a read-only user. In this case, I need to provide my *own* credentials on the command line to override the ones that were stored locally during the checkout. This is done just like any other command that requires credentials. I just put this note here to remind myself that this is possible:

cvs -d commit

Technorati Tags: , , , , , , , , , , , ,

Social Bookmarks: