Cool URLs

An idea about preserving web content, even though the author acknowledges difficulties in maintaining original URLs.

A great idea

According to the people who think about web standards, the World Wide Web Consortium (W3C), Cool URIs don’t change. It’s a basic idea: once a page on the internet is created, it should – in theory and with enough money – stay at the same address forever. Always there and available for reference. I think the idea is sound. We have all come across links that no longer work; a concept that became known as link rot. I wrote about it in June 2004, in a post entitled Learning from Others.

Harder in practice

I might like and support the idea, but as I enter my 29th year writing on the web, I know I’ve been unable to honour the concept. For example, above is not, strictly, the right link for the ‘Learning from Others’ post. I might argue that the version here, which is on the domain I used to use for blogging, is more accurate. The content is the same, they’re still my words, but musak.org was the original home.

That’s still not the original URL, however. Sometime in the mid-2000s I archived the musak.org site when I switched blogging platform. I imported posts into the new tool without much thought. I wasn’t sure I was going to keep the old site around. I also copied some of the posts into curnow.org so that I would keep a copy even if I killed off the other site. The closest to the original URL is now at the wonderful Internet Archive (or Wayback Machine), and is a snapshot from July 2004: Archived: Learning from Others.

If you didn’t know about the archived version and tried to go to the original post it would generate a ‘page not found’ type of error; 404, in internet speak. Even worse, there would be almost no clue that it’s still possible to read the original words. I could do something clever on the server to rewrite the links. Maybe I’ll get to that when I have time to write some code.

Correcting link rot

Those original posts were not updated when I mothballed the site into the new platform. As a result, musak.org had quite a bit of internal link rot. Occasionally, I look back and read something old and decide to correct the internal links. Eventually, I will finish that task and everything will be properly linked.

While I am in ‘correction mode’, I also check other outbound links on those old posts. If they no longer work I decided I’d update them. If I can find an online version of the original text at a different URL then I correct it. If I don’t, I try the Internet Archive. If I can find neither, I leave the broken link.

Last summer, James Cridland wrote about Fixing 404 errors and link rot, while maintaining authenticity. He took a different approach to updating dead links. I think his path is more inline with the ‘cool URL’ concept, but I’m happy with my compromise.

My weeknotes

When I started my weeknotes, I decided to prepare for future link rot and preemptively included a reference to the Internet Archive version of all the things I’d linked to in that week’s note. That way, I knew there would be a snapshot taken around the time I wrote a note and, in the future, it would be easier to navigate to the archive if link was broken.

I have been reviewing my 2023 weeknotes. It’s an interesting exercise to understand my year. But, I think the ‘Archive’ section that includes the Wayback Machine links makes reading a series of notes harder than it need be.

So, while I’m going to make sure all the links are added to the Internet Archive whenever I post a new weeknote, I’m dropping that section.

My URLs, however, will stay cool (perhaps the only thing I do that is).

It Was Sixteen Years Ago Today

What does the web of previous 1 October tell us?

I few months ago I wrote that sometimes, “I come to visit my website just to look at the Blast From The Past section.” Admittedly, I don’t do this very often but today, on my morning commute, I did and found three entries from the first day of October in years gone by: 2002, 2004 and 2010.  As a view of the past, I thought they made an interesting set of posts to study.

Eight years ago I was sharing interesting news links from the world digital advertising on an almost daily basis; something that you’d find on Twitter today and not laguishing on a blog. As I’ve said before, Twitter is probably a better place for such updates. Back then, I expressed surprise that part of the digital ad world was described by AdWeek as a ‘cesspool’: I thought it was a little extreme. Today, I’d probably not be so surprised and I might even agree with that description.

The ‘cesspool’ comment was used in a session at AdWeek 2010 where, “[T]he easy availability of low-cost online advertising space was a theme, and a problem, the panel returned to several times” [quote].  I imagine many of the people have come back to that theme a good few times since then! I wonder how many of the attendees 8 years ago are amongst the podcasters, influencers and digital prophets at AdWeek 2018.  Certainly, three roles that were not in use at the turn of the millennium when the other 1st October entries were written.

In 2004 I wrote about a phone being stolen which seemed quite important at the time but, from today’s vantage point, the focus on the newspaper headline of the day is much more interesting.  These days I have no idea what the headline on the evening paper is as I head home and it’s unlikely I’m using my phone camera to grab a snapshot. Somewhere along the way, at least to me, headlines became less interesting because my news sources were much more personalised and my experience of the Evening Standard today (primarily accessed via a news aggregator on my phone) will be different to yours.

But it’s the sixteen year-old entry that really caught my attention. How does the ‘Snapshot of the Blogsphere’ stand-up today?  It’s rather poor: all the three links noted are no longer accessible from their original pages because none of the sites are active anymore, although Tom’s plasticbag.org is still archived even if the links are broken (a little bit of searching does come up with the original entry).  There so much of the early web that’s gone. Fortunately, the Wayback Machine has some kind of copy of the material and I have been able to update the original links (see: Tom, Meg, Bart). It’s not great because I don’t imagine many people will go searching for them if they get a ‘not found’ error.  I wish there was a way to prevent this but what to do when the owners don’t want to do it anymore?

I’m glad I managed to rescue the snapshot of 2002. I don’t read anywhere near as many blogs as I did back then but, just in case I want to check in with myself in another 16 years, here’s a quick look at what I read today:

  1. Some years getting the Gold Card discount added to my Oyster is really simple, and other years everyone shakes their head and says “no, we can’t do that here, go away”. This year’s attempt proved almost, but not quite, at the easy end of the scale. [DiamondGeezer]
  2. Musk doesn’t deserve to be compared to Steve Jobs, he’s a category unto himself. He has improvised on a scale we’ve never seen before and has forced the incumbents to wake up and adopt EVs as their future. [Monday Note]
  3. It was not hard to see why Trump hadn’t seen the point in preparing to take over the federal government: why study for a test you will never need to take? [kottke.org]

Let’s just make sure the Wayback Machine has a copy.

It Was A Good Read

While I will miss the disappearances, they are – of course, just blips in the workings of the web. What I find sad is that, in time, it is likely that all this content will disappear from servers as the owners stop paying for the space that houses the sites. It would be like burning every copy of a book you had read – vanished. It’s part of a shared history that disappears.

I always feel it’s a little sad when a blog dies – particularly when all trace of it is removed. If it’s a blog I have been reading for some time then it feels as if a part of my history disappears. It is one of the strange things about the online experience – it’s very easy for things to disappear; things that were once inspirational, useful or entertaining.

One of my earliest online inspirations was Jase Wells. Although I’d been trying out building web pages for the company I worked for, Jase was the inspiration for my first home page (sadly long gone from the servers on which it resided and a great example of what I am talking about). Jase is still alive and well but the focus of his site has changed and, while it’s updated much more often now, the coming out story that was such a useful resource has gone (although it’s still available via archive.org).

Another Jase, now Snoboardr of OutEverywhere, had some personal pages once that were also fairly important in my use of the web.

Then there are the blogs that disappear. Mike of Troubled Diva fame (who I was introduced to via the excellent 40in40) put the blog on indefinite hold at the beginning of December. 8Legs went the same way a few weeks later. And now Chris has packed up. I don’t know Chris nor have I ever mailed or commented his site but I read it almost religiously. Why? Well, he has a talent for writing to the extent that almost everything he wrote was compelling. It was his writing style which was an inspiration because, by the time I discovered his site, I had been writing this blog for a while.

At least Daniel’s said it’s unlikely that he will give up completely.

While I will miss the disappearances, they are – of course, just blips in the workings of the web. What I find sad is that, in time, it is likely that all this content will disappear from servers as the owners stop paying for the space that houses the sites. It would be like burning every copy of a book you had read – vanished. It’s part of a shared history that disappears.

Diary writers perform an unintentional function as social historians. If you go all the way back to Pepys or think more recently of somebody like Kenneth Williams, their diaries are read today and give us an insight into what the world was like. If Mike or Chris has written their blogs as paper-based diaries there may very well have been something for historians to use in the future. If they don’t keep some kind of record of what they wrote in an accessible form then it will be lost to the future and people trying to understand life in the 21st Century will be poorer.

So, to those who wrote content I enjoyed reading, a plea. Archive your content for future generations. Regardless of how you do it, keep it.

Oh, and thanks for sharing your thoughts. I enjoyed them all.