Skip to content

Musings of an Anonymous Geek

Made with only the finest 1's and 0's

Menu
  • About
  • Search Results
Menu

Social Media, The Future of News, and Data Mining

Posted on May 16, 2008 by bkjones

I went to a very good panel discussion yesterday hosted by the Center for Information Technology Policy at Princeton University. There has been a conference going on there that covers a lot of the overlap between technology, law, and journalism, and the panel discussion yesterday, Data Mining, Visualization, and Interactivity was even more enlightening than I had anticipated.

The panel members included Matt Hurst, of Microsoft Live Labs, Kevin Anderson, blog editor for The Guardian, and David Blei, a professor at the Computer Science Dept., Princeton University. This made for a very lively discussion, covering a wide range of perspectives about social media, “what is news?”, how technology is changing how people interact with information (including news), how the news game is changing as a result (which was far more fascinating than it sounds), and how this unfathomably enormous stream of bits, enabled by lots of open APIs, feeds, and other data streams can be managed, mined, reduced, and presented in some value-added way (part of the value being the sheer reduction in noise).

Cool Tools for Finding News

Some of the tools presented by the panelists were new to me, and aside from being great tools for bloggers and other content publishers, there are some excellent examples of how to make effective use of the data you have access to through APIs like the Digg API.

BlogPulse

This was presented by Matt Hurst. It’s is pretty neat – it’s a tool that essentially charts blog buzz of a given phrase over time, and it even lets you compare multiple phrases, which is really interesting as well. Check it out here.

I’d like to know more about how it derives the metrics, but in doing a couple of quick comparisons using the tool, it seems to line up to some degree with simple comparisons of the number of search results for different phrases on sites like technorati and bloglines. Interestingly, even though there appears to be lots more data available at Technorati, in my very limited experimenting, the percent difference between search results for any two phrases appears to be similar, indicating that bloglines may be a representative sampling of technorati data. More experimentation, of course, would be needed to lend any credibility whatsoever to that claim. It’s probably irrelevant, because you can’t ask either service for any kind of historical data regarding search results 🙂

Twistori

This has the potential to be really interesting. Right now, it lets you pick from several different terms, like “love”, “wish”, “think” and “feel”, and after clicking one of those, it’ll start producing a constantly updating stream of twitters that contain those words. If this experiment is successful, I would imagine they’d eventually enable the same service for arbitrary keywords, which would be really powerful, and quite a lot of fun!

Tweetwheel

Oh how boring my life according to twitter is. I’m still in the schizophrenic stage of settling on a live ‘update your friends on what you’re doing whether they care or not’ services. Facebook, myspace, twitter, jaiku… there are too many. I’m trying out the imified route now to consolidate all the cruft. According to tweetwheel, there are more places to update my status at any given moment than there are people who give a damn what my status is.

Anyway, tweetwheel shows how you’re connected to people through twitter. If you have lots of followers and follow lots of people, the wheel is really exciting to look at, as displayed by Kevin Anderson, who has a much more “robust” wheel than me — it’s actually interesting to look at. At some point I’d like to see this idea expanded to cover the other services like Facebook and even LinkedIn.

Digg Labs

You have to go to the Digg Labs site and see what people are doing with the Digg API. There are too many awesome utlities to cover them all here. It almost makes me wish I did fancy Flash UI stuff instead of back end data mining and infrastructure administration.

At a higher level…

Most of the discussion about social media seems to be about measuring buzz created by bloggers (at least where news/content publishing is concerned). However, although things have shifted dramatically in a ‘consumers are producers’ direction, causing people to start rethinking the definition of news, this shift is caused as much by consumers who are still *only* consuming as anyone else, and I didn’t see much in the way of tools that measure the interest of those people in any meaningful way. Perhaps the consensus is that the bloggers are a representative sampling of the wider internet readership? I don’t know. I would disagree with that if it were the case.

I work for AddThis.com, which seeks to provide publishers of news and all kinds of other content with statistics that help them figure out not just what pages people happen to be landing on, but which ones they have elected to take a greater interest in, either by emailing it to a friend, adding it to their favorites, or posting it to digg, delicious, or some other service. Maybe some day there will be an AddThis API that’ll let you easily do even more interesting things with social media.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Facebook (Opens in new window) Facebook

Recent Posts

  • Auditing Your Data Migration To ClickHouse Using ClickHouse Local
  • ClickHouse Cheat Sheet 2024
  • User Activation With Django and Djoser
  • Python Selenium Webdriver Notes
  • On Keeping A Journal and Journaling
  • What Geeks Could Learn From Working In Restaurants
  • What I’ve Been Up To
  • PyCon Talk Proposals: All You Need to Know And More
  • Sending Alerts With Graphite Graphs From Nagios
  • The Python User Group in Princeton (PUG-IP): 6 months in

Categories

  • Apple
  • Big Ideas
  • Books
  • CodeKata
  • Database
  • Django
  • Freelancing
  • Hacks
  • journaling
  • Leadership
  • Linux
  • LinuxLaboratory
  • Loghetti
  • Me stuff
  • Other Cool Blogs
  • PHP
  • Productivity
  • Python
  • PyTPMOTW
  • Ruby
  • Scripting
  • Sysadmin
  • Technology
  • Testing
  • Uncategorized
  • Web Services
  • Woodworking

Archives

  • January 2024
  • May 2021
  • December 2020
  • January 2014
  • September 2012
  • August 2012
  • February 2012
  • November 2011
  • October 2011
  • June 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • September 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • September 2006
  • August 2006
  • July 2006
  • June 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • September 2005
  • August 2005
  • July 2005
  • June 2005
  • May 2005
  • April 2005
  • March 2005
  • February 2005
  • January 2005
  • December 2004
  • November 2004
  • October 2004
  • September 2004
  • August 2004
© 2025 Musings of an Anonymous Geek | Powered by Minimalist Blog WordPress Theme