Wednesday, August 24, 2011

It's Alive! Cleaning up broken URLs with MVC Routing

If your website has been around for a while, then you've probably got some dead links out there on the web. You may have changed your site's folder structures a few times, changed technologies from classic asp, to php, to ASP.NET web forms, to MVC. I've had to tackle this problem continuously, since my site Connemara.net, maintained more or less as a hobby at this stage, has been around since 1996.

One of the benefits of using ASP.NET MVC is routing. Although routing is not confined to MVC, located as it is in System.Web.Routing, it is strongly associated with MVC. With ASP.NET Web Forms, URLs generally correspond to files on disk, so for example the address www.connemara.net/words/index.aspx?id=079 meant there was a file called "index.aspx" in the top-level folder called "Words" and it is going to look for something with an id of 079, most likely a record in a table in a database. It's the Pompidou Centre URL pattern, one where the skeleton is on the outside, for everyone to see.

Flickr photo
FlickrPompidou Centre, by Edward Langley. Postmodernist icon, shit URL structure

Routing, on the other hand, places resources front and centre. A resource is anything important enough to have its own address. Having URLs that reflect your resources is part of what's called the Resource-Oriented Architecture in 'RESTful Web Services', which you should read.

So, for example on Connemara.net the first 'Letter from Home' that my friend Eugene wrote for the 'words' section is a resource: it's something you'd make a link to, but one whose original address was http://www.connemara.net/words/letter/no.1.htm. That 'no.1.htm' file has long since stopped being served from that address. Which is a real pity, because ol' Euge wrote some nice stuff back then, and it'd be nice to preserve it.

Google Webmaster Tools Crawl Errors page is where links go to die, so you should buy some flowers and pay your respects from time to time. Your users, if you're lucky enough to have any, tend also to tell you all about your broken links. Connemara.net goes back to Oct. '96, so there's plenty of early defunct ".htm"s, ".html"s, and even ".tmpls" littering the far corners of the web. Then there are what I call "The PHP Years". My first ever web scripting language. Good times. But now the party's over, and it's time to clean up the condoms. That's where MVC routing comes in. How to fix up a link like www.connemara.net/words/article.php?id=075?

I can start by making a route template that contains the literal value "words", and catches anything after that (and the forward slash). So "/words/article.php?id=075" matches, and the 'oldPath' parameter gets the value 'article.php?id=075'. Just map a route like this:
routes.MapRoute("OldWordsArticle",
                "Words/{oldPath}",
                new { controller = "Words", action = "RedirectToArticle" } );

routes.MapRoute("WordsArticle",
                "Words/Articles/{id}/{hyphenatedTitle slug}",
                new { controller = "Words", action = "Article", hyphenatedTitle slug = UrlParameter.Optional } );
That route is what's called greedy. It'll catch any request to the the Words folder as long as it's positioned before the more refined route patterns. Within WordsController, I strip out the id from oldPath ('/article.php?id=123'), look up what that article's new id is, and then reroute the request to a new route, 'WordsArticles'.
public ActionResult RedirectToArticle(string oldPath)
{
    var oldId = ResolveOldId();

    // get the Id of the article to generate the correct URL
    var article = ArticleRepository.Search<Article>(oldId).FirstOrDefault();

    if(article != null)
        return new RedirectToRouteResult("WordsArticle",
                                         new RouteValueDictionary {
                                             { "controller", "Words" },
                                             { "action", "Article" },
                                             { "id", article.Id }
                                             { "slug", article.Slug}
                                         });

    // no article found?
    ViewBag.Message = "No article with an id of " + oldId + " found";
    return View("NotFound");
}
The main point here is that I'm using one route pattern to catch bad old links, and steering them to the correct route. Normally, having matched an incoming request to a route, you then hit a controller method which returns a view, which is a type of ActionResult. But in this case, I return a different type of result, a RedirectToRouteResult, which turns the original route into a recursive one. If you're not careful you could end up in an infinite route black hole and crash the internet.

So now if I browse to www.connemara.net/words/article.php?id=075, the "OldWordsArticle" route catches the request, RedirectToArticle() deals with it and routes it to "WordsArticle" which knows how to serve up a normal ActionResult/View, with a brand, spanking new, RESTy URL of http://www.connemara.net/words/articles/22/michael-gibbons--person-in-profile. "It's Alive! It's Alive!"

EDIT 23/5/2012

Since I wrote this post, Connemara.net has been revamped, courtesy of Noel at Connemara Publications. I removed the links to the old Connemara.net pages that are mentioned, but the thrust of the post is unaffected.