Webometric Thoughts

August 21, 2008

Iterasi: Create your own archive!

Filed under: archive,national archives,web archives,webometrics — admin @ 12:27 pm

The UK’s web archive is pretty rubbish, therefore Iterasi (highlighted by TechCrunch) is a great addition to the web.

Rather than merely bookmarking a URL, you can archive the actual page, and can continue archiving the page on a regular basis if you so wish. The only downsides to the site are that it only allows you to archive on a daily basis (for the front pages of news sites you may want to archive more regularly), and it only archives when your computer, with its list of scheduled saves, is turned on.

The potential for webometric studies is obvious, it would seem as though even the most technologically incompetent of us can now simply collect longitudinal data. For example, Google searches may be collected on a daily basis to see how the results or the number of hits changes…and once you have archived a page, it’s very simple to then embed the page:

It also has potential for bloggers; when they discuss a page or story bloggers can now be sure that their readers will have access to the page that they saw rather than an updated version. How content providers will react to the archiving of their content is yet to be seen.

November 23, 2007

National internet archives have a long way to go

Archiving the web is massive job, and whilst the Internet Archive does as good a job as can be expected from a single centralised organisation, there really is a need for better national web archives. I am brought to the beginnings of a little rant by ResourceShelf highlighting Canada’s new government web archives. Whilst I am sure that this will do a great job of archiving the Canadian government’s web sites, it is a drop in the ocean of the number of Canadian web sites that could and should be kept, and seems an awfully long time coming.

However the British really don’t have a leg to stand on when it comes to complaining about archives, as our own archive is a particularly sorry affair, based on the crawls of less that 3,000 web sites. Rather than following the route of the Canadian web site and covering a particular domain exhaustively, it chooses instead to select various sites (with permissions of web owners) of interest to the consortium members. As they say themselves “there is a danger that invaluable scholarly, cultural and scientific resources will be lost to future generations”, personally I feel these archives do little to stop the vast majority of resources being lost.

We wouldn’t accept a national library that contains such a pitiful selection of books, but somehow we allow such pathetic web archives to continue. In the UK I would like to see:
-The British Library given the right to copy every web page, in the same way as it has the right to a copy of every book (there should be no need to ask for permission and selection misses too much).
-It should be provided with the expertise and money to archive the whole of the .uk domain.
-If necessary, to appease those who confuse the public world of the web with the private mutterings of a conversation and thoughts of a diary, it should allow pages to (on request) be deep-archived* for a period of time rather than permanently deleted.

Whilst I am sure that institutions such as the British Library are trying to improve the UK’s web archive, the current outputs seem remarkably underwhelming for a supposedly rich nation at the forefront of the world’s knowledge-economy.

*deep-archived…I couldn’t think of a term to describe something that was in the archive but not public, private seems to have different connotations.

Powered by WordPress