Linkrot on Steroids: The Problems with URL Shorteners

As Simon Owens recently observed, tr.im — a service that shortened URLs — is now gone. The links that it once helpfully compressed are now useless. For those who may have passed on a link to a pal, tweeted a particularly helpful article, or otherwise stopped an unruly URL from breaking in two because of a monitor’s constraining width, this metadata means nothing. How long will it be before all the other URL shortening services are about as valuable as a maniac with a fetish for smearing Crisco on random monitors or some sad and anonymous man who wastes his entire weekend on the Internet pretending to be somebody else on Twitter? Twhirl, the Adobe AIR app aiding folks in posting silly thoughts and links to Twitter, presents us with digg.com, is.gd, bit.ly, snurl.com, and twurl.nl as link-shortening options, all desperately needed if anyone expects to use the 140 character limit. But will these shorteners even exist in six months? Shouldn’t the mad scientists at Twitter come up with an in-house standard to ensure some longevity? (All this, of course, assumes that our tweets, or anything we put online, is even permanent — a subject I rambled at length about last week.)

There’s also the problem of linkrot. The ever-shifting Wikipedia page suggests that Tim Berners-Lee was the first person to warn against these constantly changing links. Some extremely lazy excavation reveals that Jakob Nielsen was on the case on June 14, 1998, pointing, with unintentional and unanticipated irony, to “a recent survey by Terry Sullivan’s All Things Web.” But the link today is no longer good. I consult The Wayback Machine, waiting a few patient minutes for some hopeful snapshot of the Sullivan site in question, getting a total of 91 versions between 1998 and 2008. And of course, a click to one of these surrogate McCarthy functions takes another 40 seconds, and I don’t know which version is even the optimal one. And I find dramatic differences between the last version of the site in 2008 and the first version of the site in 1998. To name just one modification, the 2008 version reveals that the survey was conducted in April 1997. I am directed to the actual survey, which thankfully still maintains its original URL. But for how long?) There is no such date in the 1998 version. The 2008 version compares three State of the Web surveys. But what if we want to know what Terry Sullivan wrote about the original survey in 1998? The new page gives no indication that Sullivan changed the page and doesn’t address us to an older version. (I should point out that the Guardian has, And if you try and call up All Things Web in Firefox 3.5.2, you get a 403 error. What was once public is now private or “down for maintenance” (as of August 9, 2009, 9:13 PM EST). Nielsen has referenced only general details in his piece, as well as the original URL, which the patient types will attempt to extract through the Wayback Machine.

But let’s say that Nielsen had used something like tr.im to point to Sullivan. Would we be able to conduct this experiment? Instead of having 91 versions of Sullivan’s website to examine, we’d have to perform some guesswork, assuming the page was referenced by others and assuming that this was the only page in which Sullivan wrote about the “recent survey” in 1998.

Let’s also consider that all of the content and all of the links that we type into Twitter (or, for that matter, a webmail service) involves relying on a third-party website. A third-party website that has been prone to outages, lost tweets, lost followers, and lost information. What steps then is Twitter taking to ensure that all of the data generated at a historical moment is preserved? What are the URL shorteners doing to ensure that the regular versions of URLs are preserved?

Five years from now, will anyone investigating the manner in which CNN and The New York Times relied on Twitter for its news about recent events in Iran be able to check the original data that these ostensible reporters relied on? Will these reporters keep any notes they generated? Will their links still be good? Will the New York Time‘s links still be around? (Hell, will the New York Times even still be around?)

Our cavalier refusal to ask these questions only exacerbates the problem of linkrot. There are thankfully methods of backing up your Twitter data, but how many Twitter users will even do this? We are forced by necessity to shorten the links, but “abuse of the service” may cause it to be temporarily disabled. Bit.ly helpfully offers a “history” of recently shortened links. And it even tracks the URLs that you’ve recently shortened even if you’ve never signed up or signed in. But days later, the history is cleared.

Just for fun, I performed an advanced Twitter search on all uses of “bit.ly” on Twitter through February 28, 2009. “No results for bit.ly until:2009-02-28.” I know this cannot be. But let’s give Twitter the benefit of the doubt. All uses of “tinyurl” on Twitter through February 28, 2009? “No results for tinyurl until:2009-02-28.”

These search results are, as anyone who has used Twitter and URL shorteners in the past two years, outright wrong. Twitter lacks the resources to preserve our data from six months ago. How can we expect it to preserve our data six months from now? In our great rush to adopt tools of change, our failure to backup the data we’ve already generated is the Internet’s equivalent to the explosive silver nitrate film stock and reckless cataloging that has permitted only 10 to 15% of silent movies to survive, with the remainder thought to be lost forever. (And who knows if there will be some online answer to Carl Bennett?)

But then many of the prospective answers to these questions depend on how much we value the services we’re using, and just how much we’re willing to waste our weekends on a desperate effort at tenuous restitution.