Thursday, August 4, 2011

Aggregating content with Twitter and Google Reader

You can chain Twitter, Google Reader, and Google Buzz to get content into your site in an unholy mashup of REST, ATOM, and WCF. Here's how I do it on my site, Connemara.net.

In an attempt to inject some much-needed currency into Connemara.net, I have decided to emphasize events. The front page will soon be a rolling, blog-like list of what's on in Connemara, in reverse chronological order. Local events shall henceforth be first-class citizens of Connemara.net.

Even though the site has traditionally had plenty of its own content, in keeping with the way it has functioned for the last 7 or so years the information for these events will be aggregated, using Google Reader, from web resources - namely other websites - and also from Twitter. There is so much local event information out there in terms of newspaper articles, local sites' blog entries, individuals' tweets, etc. The challenge is to find it all and organize it so that it becomes useful for people.

Flickr photo
Flickrwelcome to arts week, by Kymberly Janisch

First of all, how do you find out about local events? In my case, in 3 ways:
  1. Somebody associated with the event emails me.
  2. Somebody @ConnemaraNet follows tweets about the event.
  3. Somebody includes it in their site feed, when then appears in my Google Reader 'Connemara' subscriptions list.

In the case of item no.1, what usually happens is that someone writes to me about a local event (usually attaching a Word document and one or more photos) asking me to put it on the site. Jumping into action a week later I create the news item and publish it to an address like www.connemara.net/News/2011/05/Biggest-Names-in-Irish-Sport-Come-to-Clifden. That gives me a URL I can then tweet, so in effect this then becomes an item no.2. Which reduces that list to just two: events I find out about in Reader, or events I find out about in Twitter.

Tweets are a few of my favourite things

So how do I identify tweets or Reader items as being of interest? The first thing to note is that everything has to end up in Reader (so that it can end up in Buzz!) so that means I have to have some way of telling Reader to get event-related tweets. What are the ways you can interact with tweets? You can favourite them. The idea here is to be unobtrusive. If you favourite a tweet, that particular preference is not shown anywhere else other than by browsing to your favourites. Who's going to do that? And anyway, so what if they do? It's perfectly unobtrusive. Then all we have to do is to get a feed of those favourites and stick it in Reader, and we've turned everything into an item no.3.

We got ourselves a Reader

Everything's been funnelled into Reader. The tweets that I've favourite'd are all obviously event-related, but the rest of the subscription items form a heterogeneous list of keyword-related news events, blog entries, and lordy-knows-what, so the actual event-related ones have to be cherry-picked manually. Furthermore, having been identified as an event item, they then have to be marked up with the correct metadata.

Browsing the list of unread items, I share any item that's about a local event, then add my metadata in the form of a comment, e.g. "Events (Name: Clifden Arts Week, Date: 10 Sep 2011 - 20 Sep 2011, Location: Clifden, Url: www.clifdenartsweek.ie)". The business of entering in the metadata is ultimately the least scalable part of the whole operation. But its also the part where my human intervention gives the most value.

In any aggregation process like this, the art lies in finding the boundary between automation and manual intervention. I could automate the process more and have slightly crappier event entries on my site, or I could spend more time on each one and have better entries. In the case of one local event recently The Irish Times headline was "Holiday art auctions in Cork, Connemara". All I want for Connemara.Net/Events is "Art Auction". So I have to enter that metadata or accept the original less direct event title. Also, how could you scrape the dates? The idea, as ever, is to do the most development work up-front in order to do the least amount for each event. The gods of scale must be appeased. But there is a minimum amount of work that has to happen for each event. Sharing an item in Reader allows you to post a comment, and in this case I enter "Event (name:Art Auction, date: 2011 Aug 3, location: Ballynahinch)". If there was an official event url it'd go in there too.



Cross-posting to Buzz

Unfortunately there is no Reader API, so the only way I can ultimately 'read' items that I have shared is to cross-post that item to Google Buzz. There's no work to do here, though. As long as your Buzz account is 'connected' to your Reader account, activity on Reader will create posts in Buzz. And as long as no-one follows you in Buzz, there's no spam. Connemara.net has a twitter account, but is not asking anyone to follow its posts on Buzz (or Reader, for that matter) so that process is effectively unobtrusive.

The REST is easy

At the end of all that, I've got a nice looking Buzz feed, rich in processed event items and ready to bring some order into this chaotic world. This feed is the raw material consumed whenever anyone visits Connemara.net/Events, or even just the front page. Consumption happens by means of the RESTful Buzz API in a process thrillingly similar to the one I've already explained in my earlier post about the sadly-defunct Google Maps Data API. That's one of the selling points of REST: the uniform interface makes the web more programmable.

On my events page, each event shows
  • The name of the event
  • The date(s)
  • The official URL, if there is one
  • Photo(s)
  • The location (name of place, like 'Clifden')
  • A colour-code indicating whether it's currently on, has finished, or is in the future

and I have big plans for:
  • Extra links from news, local sites, mentioning the event
  • YouTube videos
  • Social media content, mainly tweets

Cloud Caveat

One downside of using the cloud as a database like this is that I'm subject to the restrictions imposed by both the Twitter and Buzz APIs, most importantly in terms of how far back into the past I can go. Twitter is particularly parsimonious in this respect: you can only get your 20 most recent favourites using their API. This process that I'm outlining here is suitable either for ephemeral, time-sensitive data like current events, where you don't care about the past, or as a first step before persisting that ephemera once it has been read into your app using SQL Server, for example - but that's for another post.