Webometric Thoughts

November 23, 2007

National internet archives have a long way to go

Archiving the web is massive job, and whilst the Internet Archive does as good a job as can be expected from a single centralised organisation, there really is a need for better national web archives. I am brought to the beginnings of a little rant by ResourceShelf highlighting Canada’s new government web archives. Whilst I am sure that this will do a great job of archiving the Canadian government’s web sites, it is a drop in the ocean of the number of Canadian web sites that could and should be kept, and seems an awfully long time coming.

However the British really don’t have a leg to stand on when it comes to complaining about archives, as our own archive is a particularly sorry affair, based on the crawls of less that 3,000 web sites. Rather than following the route of the Canadian web site and covering a particular domain exhaustively, it chooses instead to select various sites (with permissions of web owners) of interest to the consortium members. As they say themselves “there is a danger that invaluable scholarly, cultural and scientific resources will be lost to future generations”, personally I feel these archives do little to stop the vast majority of resources being lost.

We wouldn’t accept a national library that contains such a pitiful selection of books, but somehow we allow such pathetic web archives to continue. In the UK I would like to see:
-The British Library given the right to copy every web page, in the same way as it has the right to a copy of every book (there should be no need to ask for permission and selection misses too much).
-It should be provided with the expertise and money to archive the whole of the .uk domain.
-If necessary, to appease those who confuse the public world of the web with the private mutterings of a conversation and thoughts of a diary, it should allow pages to (on request) be deep-archived* for a period of time rather than permanently deleted.

Whilst I am sure that institutions such as the British Library are trying to improve the UK’s web archive, the current outputs seem remarkably underwhelming for a supposedly rich nation at the forefront of the world’s knowledge-economy.

*deep-archived…I couldn’t think of a term to describe something that was in the archive but not public, private seems to have different connotations.

Powered by WordPress