Webometric Thoughts

April 25, 2008

Open Data – post 2

Filed under: Twitter,frequency cloud,open data,word tree — admin @ 2:59 pm

I posted earlier about open data, and included an example of the sort of network diagram Many Eyes allows. Following this, I decided to see the sort of things it could do with some text, and uploaded some data I had collected from Twitter back in February.

Not a particularly great set of data, but something interesting to play about with, with word frequency clouds:

And a word tree:

I will definatley be keeping the potential of these tools in mind as I collect bits and pieces from the web in the weeks ahead.

Open Data: A link diagram

Filed under: link analysis,network diagram,open data — admin @ 10:13 am

One of the problems with the web is that it is just too damned big: just as you think you are uptodate with everything in one area, you suddenly realise that there is a whole other area that you has totally passed you by. For me that area is Open Data: the practice of making data freely available to everyone. Whilst I had heard a few rumblings, I didn’t really appreciate how much was going on, or some of the tools that were available, until reading an article in the last issue of Online Magazine. Webometricians create massive amounts of data, and whilst we know we should do more, we generally use the data we gather as the subject of academic papers, or blog posts, then it sits on our hardrives until we forget where it was from and what it represents (personally I have gigabytes worth of data in text files that is now totally meaningless to me).

In future I will definately make a concerted effort to try and make data available on Open Data sites (whether people like it or not). Not only due to the movement’s worthy ethos, but for the selfish reasons of a useful repository and the benefits of some useful tools. Of the many open data sites my first experimentation has been with IBM’s Many Eyes (http://services.alphaworks.ibm.com/manyeyes/home), which, whilst suffering from a few bugs, has some great visualisation applications, including network diagrams:

This particular network comes from my, ever-so-successful, PhD thesis. It shows the interlinking between the web sites of 64 members of the Association of the British Pharmaceutical Industry, as seen through the Microsoft Live Search API (in the glory days of access to both the linkdomain and linkfromdomain operators). Obviously not particular awe-inspiring here, but earth-shattering in the context of 130 other pages.

Additional open data sites include Data360, Swivel, Freebase, and many more. Whilst I’m sure that different people will find different sites more appropriate to their needs, the main thing is that we (espicially academics) start getting the data out there…and more than the off the cuff 95 lines I uploaded for the above diagram.

Powered by WordPress