Wednesday, August 24, 2011

It's Alive! Cleaning up broken URLs with MVC Routing

If your website has been around for a while, then you've probably got some dead links out there on the web. You may have changed your site's folder structures a few times, changed technologies from classic asp, to php, to ASP.NET web forms, to MVC. I've had to tackle this problem continuously, since my site Connemara.net, maintained more or less as a hobby at this stage, has been around since 1996.

One of the benefits of using ASP.NET MVC is routing. Although routing is not confined to MVC, located as it is in System.Web.Routing, it is strongly associated with MVC. With ASP.NET Web Forms, URLs generally correspond to files on disk, so for example the address www.connemara.net/words/index.aspx?id=079 meant there was a file called "index.aspx" in the top-level folder called "Words" and it is going to look for something with an id of 079, most likely a record in a table in a database. It's the Pompidou Centre URL pattern, one where the skeleton is on the outside, for everyone to see.

Flickr photo
FlickrPompidou Centre, by Edward Langley. Postmodernist icon, shit URL structure

Routing, on the other hand, places resources front and centre. A resource is anything important enough to have its own address. Having URLs that reflect your resources is part of what's called the Resource-Oriented Architecture in 'RESTful Web Services', which you should read.

So, for example on Connemara.net the first 'Letter from Home' that my friend Eugene wrote for the 'words' section is a resource: it's something you'd make a link to, but one whose original address was http://www.connemara.net/words/letter/no.1.htm. That 'no.1.htm' file has long since stopped being served from that address. Which is a real pity, because ol' Euge wrote some nice stuff back then, and it'd be nice to preserve it.

Google Webmaster Tools Crawl Errors page is where links go to die, so you should buy some flowers and pay your respects from time to time. Your users, if you're lucky enough to have any, tend also to tell you all about your broken links. Connemara.net goes back to Oct. '96, so there's plenty of early defunct ".htm"s, ".html"s, and even ".tmpls" littering the far corners of the web. Then there are what I call "The PHP Years". My first ever web scripting language. Good times. But now the party's over, and it's time to clean up the condoms. That's where MVC routing comes in. How to fix up a link like www.connemara.net/words/article.php?id=075?

I can start by making a route template that contains the literal value "words", and catches anything after that (and the forward slash). So "/words/article.php?id=075" matches, and the 'oldPath' parameter gets the value 'article.php?id=075'. Just map a route like this:
routes.MapRoute("OldWordsArticle",
                "Words/{oldPath}",
                new { controller = "Words", action = "RedirectToArticle" } );

routes.MapRoute("WordsArticle",
                "Words/Articles/{id}/{hyphenatedTitle slug}",
                new { controller = "Words", action = "Article", hyphenatedTitle slug = UrlParameter.Optional } );
That route is what's called greedy. It'll catch any request to the the Words folder as long as it's positioned before the more refined route patterns. Within WordsController, I strip out the id from oldPath ('/article.php?id=123'), look up what that article's new id is, and then reroute the request to a new route, 'WordsArticles'.
public ActionResult RedirectToArticle(string oldPath)
{
    var oldId = ResolveOldId();

    // get the Id of the article to generate the correct URL
    var article = ArticleRepository.Search<Article>(oldId).FirstOrDefault();

    if(article != null)
        return new RedirectToRouteResult("WordsArticle",
                                         new RouteValueDictionary {
                                             { "controller", "Words" },
                                             { "action", "Article" },
                                             { "id", article.Id }
                                             { "slug", article.Slug}
                                         });

    // no article found?
    ViewBag.Message = "No article with an id of " + oldId + " found";
    return View("NotFound");
}
The main point here is that I'm using one route pattern to catch bad old links, and steering them to the correct route. Normally, having matched an incoming request to a route, you then hit a controller method which returns a view, which is a type of ActionResult. But in this case, I return a different type of result, a RedirectToRouteResult, which turns the original route into a recursive one. If you're not careful you could end up in an infinite route black hole and crash the internet.

So now if I browse to www.connemara.net/words/article.php?id=075, the "OldWordsArticle" route catches the request, RedirectToArticle() deals with it and routes it to "WordsArticle" which knows how to serve up a normal ActionResult/View, with a brand, spanking new, RESTy URL of http://www.connemara.net/words/articles/22/michael-gibbons--person-in-profile. "It's Alive! It's Alive!"

EDIT 23/5/2012

Since I wrote this post, Connemara.net has been revamped, courtesy of Noel at Connemara Publications. I removed the links to the old Connemara.net pages that are mentioned, but the thrust of the post is unaffected.

16 comments:

  1. 1. I suggest you make your urls lowercase such as words/articles/{id}/{hyphenatedTitle}.

    2. Your hyphenated title is commonly referred to as a slug see http://en.wikipedia.org/wiki/Slug_(web_publishing)

    Why don't you add this in the route value dictionary when you redirect? Otherwise the user / search engine only sees as far as the id in the url.

    3. Your repository call looks bizarre and it looks like you are not using a service locator. For starters I would be making these calls via an interface both for article search and ResolveOldId;

    interface IArticleRepository
    {
    Article Search(int id);
    }

    interface IIdResolution
    {
    int ResolveOldId(string currentUrlPart);
    }

    so your calls would be:

    var oldId = _idResolution.ResolveOldId(oldPath);
    and later
    var article = _articleRepository.Search(oldId);

    ReplyDelete
  2. Thanks boon,

    1. Why? I mean, consistency would be nice but why all lowercase?

    2. Never knew it was called a slug! Thanks. It isn't in the redirected route because I got lazy. Well spotted. You're right, of course, it should be in there. In fact, that's kinda one of the main points. Curse you. But thanks.

    3. No, not using a service locator. Reckon it would be overkill for my site.

    ReplyDelete
  3. My 2 cents:

    1. I think lowercase is just more of a convention - do browsers/web servers treat them as case sensitive? I know IIS doesn't seem to care, which is what your ASP.NET MVC site is running on.

    2. Slug is such a cool word for that. I can already see your urls oozing all over and leaving snail trails all over my Chrome man!

    ReplyDelete
  4. As for lower case its kind of a convention. Search engines may view differently cased urls which point to the same resource as different resources. Some hosts care about the case when accessing a resource on disk (not relevant in MVC / windows hosts).

    ReplyDelete
  5. Thank you for sharing this guide, I just followed this and it worked perfect.

    ReplyDelete
  6. Hi, Ralph. . .

    I googled "Letters from home" in an idle moment recently, and one of the hits brought me here; it was your comment about how LFH was lost. Well, as it happens, I have a photocopy of the LFM,and I can post it to you if you like. Email me at: euge1936@gmail.com

    cheers!

    euge

    ReplyDelete
  7. Here at this site really the fastidious material collection so that everybody can enjoy a lot.
    container rental near me

    ReplyDelete
  8. You need a residential cleaning organization that is adaptable. An organization that can address your quick needs is perfect.part time helper

    ReplyDelete
  9. In spite of the fact that this procedure has demonstrated great cleaning results, this cleaning strategy has not had the option to completely clean overwhelming ruining carpet due to the innovation's restriction.Carpet and Rug Cleaning Fayetteville NC 28303

    ReplyDelete
  10. The writer has written this blog in a very idiomatic manner.
    OC tree removal service

    ReplyDelete
  11. You may even choose to purchase a similar tangle, if your old most loved is as yet accessible. Nonetheless, on the off chance that you choose to settle on cover substitution, you will likely wind up spending a ton of cash.
    Brooklyn NY Carpet Cleaning

    ReplyDelete
  12. نحن نمتلك مجموعة من الخبراء والمتخصصين فى شركة تنظيف بابها ونحن نستطيع التعامل مع كافة المساحات المختلفة فلا يهم ان كنت تمتلك منزل او فيلا فأن لدينا خبرات كبيرة تمكنا من تقديم خدماتنا على اكمل وجه ولدينا عمال وفنيين محترفين ولهم خبرات مختلفة نقدم ايضآ تنظيف لواجهات الشركات والفنادق. فكل ما تحتاجه من معدات واجهزة ومنظفات ذات جودة عالمية موجودة بشركتنا فنحن نسعى فقط لارضاء العميل أولآ واخيرآ

    شركة عزل خزانات بخميس مشيط
    شركة مكافحة حشرات بخميس مشيط
    شركة غسيل مجالس بخميس مشيط
    شركة غسيل خزانات المياه بخميس مشيط
    شركة غسيل خزانات بخميس مشيط
    شركة تنظيف شقق بابها

    ReplyDelete
  13. On this subject internet page, you'll see my best information, be sure to look over this level of detail. Click here

    ReplyDelete
  14. The writer has written this blog in the most artistic way. Splendid!
    bachelor party strippers in Scottsdale

    ReplyDelete
  15. The Cleaning team leader would be liable in bringing and gathering all the equipments and tools used in cleaning the building before and after the cleaning proceedings.clothes hamper

    ReplyDelete