A couple of posts ago I was complaining about how annoying my job was as I tried to draw conclusions from the jumbled mess of environmental technology websites. Today’s post points out that it isn’t always such a jumbled mess.
I have just done a far smaller (and less scientific) data collection for a presentation I am doing in Wolverhampton tomorrow [click on picture to enlarge]:
It is a link diagram of a few web sites in Wolverhampton and the surrounding area to illustrate the sort of work my research group does.
What is noticeable from a webometric perspective is how many of the web sites included in the study are actually connected: you can link anywhere in the world, but the web is primarily a local affair.
At the moment I am investigating the linking between 1337 environmental technology web sites. Of the 1337 sites, 751 nodes create one large network:
You spend days sorting a list of URLs, collecting data, finding errors, starting again…and at the end you just have a big ball of string.
A webometrician’s job is to draw conclusions from such a jumbled mess: I hate my job.
One of the problems with the web is that it is just too damned big: just as you think you are uptodate with everything in one area, you suddenly realise that there is a whole other area that you has totally passed you by. For me that area is Open Data: the practice of making data freely available to everyone. Whilst I had heard a few rumblings, I didn’t really appreciate how much was going on, or some of the tools that were available, until reading an article in the last issue of Online Magazine. Webometricians create massive amounts of data, and whilst we know we should do more, we generally use the data we gather as the subject of academic papers, or blog posts, then it sits on our hardrives until we forget where it was from and what it represents (personally I have gigabytes worth of data in text files that is now totally meaningless to me).
In future I will definately make a concerted effort to try and make data available on Open Data sites (whether people like it or not). Not only due to the movement’s worthy ethos, but for the selfish reasons of a useful repository and the benefits of some useful tools. Of the many open data sites my first experimentation has been with IBM’s Many Eyes (http://services.alphaworks.ibm.com/manyeyes/home), which, whilst suffering from a few bugs, has some great visualisation applications, including network diagrams:
This particular network comes from my, ever-so-successful, PhD thesis. It shows the interlinking between the web sites of 64 members of the Association of the British Pharmaceutical Industry, as seen through the Microsoft Live Search API (in the glory days of access to both the linkdomain and linkfromdomain operators). Obviously not particular awe-inspiring here, but earth-shattering in the context of 130 other pages.
Additional open data sites include Data360, Swivel, Freebase, and many more. Whilst I’m sure that different people will find different sites more appropriate to their needs, the main thing is that we (espicially academics) start getting the data out there…and more than the off the cuff 95 lines I uploaded for the above diagram.