I posted here about how knowledge on the web, and on digital media generally, disappears – risking the impoverishment of future historical research.
Just before I could post this follow-up, Jessica anticipated me and commented that I should try Archive.org. Well, guess what – this is all about that.
A recent interview with British Library chief Lynne Brindley in The Guardian discussed some positive efforts to archive the web, notably the San Francisco-based Internet Archive.
In San Francisco, the non-profit Internet Archive automatically scrapes parts of the web and its Wayback Machine allows people to surf back in time to see what their favourites sites looked like as far back as 1996. It already contains three petabytes of data, which equates to more than three million gigabytes.
All well and good. But what it doesn’t mention is that the Internet Archive itself is losing its digital information.
Way back, I used to know someone called Tim Worman, who became better known as Tim Polecat – lead singer of the UK rockabilly band The Polecats. We lost touch, obviously (I don’t really move in pop star circles), but about 10 years ago, I thought I’d see if he was on the web.
He was! He had a fun site with all the usual stuff about his interests and current news – which also benefited from the fact that he was also a good artist and designer, so it looked pretty cool.
I checked back every so often, but then a few years later was disappointed to find it no longer seemed to exist. Aha – but no: there it was. Archived by the Internet Archive and accessible through its Wayback Machine (though sadly without some of the graphics and MP3 downloads).
I visited occasionally and then – guess what? Yes – his site had vanished from the Internet Archive too.
The obvious question, then, is what use is an internet archive that just archives for a few years? If Tim Polecat’s site was valuable at all, surely it should be kept in perpetuity. If it’s not actually valuable, then why keep it at all – for any length of time?
Maybe the Internet Archive scrapes the web automatically and then real people wade through the content it stores to decide what’s valuable and what isn’t – a process that would obviously take a while. So perhaps his site was only archived until someone got a chance to have a look at it and then decide it was of no use.
But that undermines the very principle of archiving ephemera that the British Library is so concerned about. After all, it is from some of the most trivial material that we gain some of our most important insights into the lives of ancient peoples. What they considered important at the time is not necessarily what concerns historians today – and we have no idea what future historians will want to know about us.