Webometric Thoughts

January 22, 2010

Semantic Webometrics – A few thoughts

Filed under: semantics,webometrics — admin @ 9:37 am

The other day an academic colleague asked what I was working on at the moment, in my answer I included – semantic webometrics – unsurprisingly he wanted some more detail. However ‘working on’ would be a bit of an exaggeration, ‘have a few ideas but nothing on paper yet’ would have been more appropriate. As such I thought I’d write down some of my rough thoughts on semantic webometrics.

For those who may have stumbled upon this blog from a non-webometric background, Webometrics as defined by Björneborn (2004), and as used by most of the webometrics community, means the:

…study of the quantitative aspects of the construction and use of information resources, structures and technologies on the Web drawing on bibliometric and informetric approaches.

Many of these quantitative studies have focused on hyperlinks. For example, investigating whether there is a correlation between a university’s inlinks (a.k.a. backlinks) and a university’s research ranking, or whether the interconnectedness of organisations in a region (as seen through interlinking web sites) can give an indication of a region’s level of innovation [outrageous self-citation].

One of the problems with many of these link-analyses is that they include a lot of noise. For example, when counting a university’s inlinks you will be counting both those from an academic highlighting a university’s quality research, and those from the disgruntled student highlighting his most hated tutor. Traditionally we have tried to understand the extent of this noise through large scale content analysis – the extremely tedious manual classification of web links and web pages.

The semantic web
A semantic web is one where information on the web is structured so that it is meaningful to computers. Well known examples of the semantic web include FOAF ontology allowing people to express the relationships with one another (e.g., the FOAF of Tim Berners-Lee) and the use of microformats for certain types of structured content including contact details (as included at www.davidstuart.co.uk) and reviews (which are now indexed by Google as Rich Snippets). This extra information information can be used to reduce the amount noise and enable meaningful webometric studies.

Semantic webometrics
So when I say semantic webometrics I mean – webometric studies that make use of the additional information included in an increasingly semantic web.

For example, a semantic webometic study of the connection between an institution’s inlinks and research ranking would take into consideration who had placed the links and the attributes that they had associated with them. A semantic webometric study of the relationships between organisations would look at the explicit relationships contained in FOAF files as well as the implicit information on web pages.

Unfortunately there is relatively little semantic information embedded in the majority of web pages/sites, and where it is widespread, e.g., with the nofollow link attribute, webometricians have yet to develop the tools to make use of them.

As such we need to take an information-centred approach to semantic webometric research rather than a problem-centred approach. Whilst still small, there is an increasing amounts of semantic data being embedded in the web all the time, webometricians need to investigate what is available and how they can use it.

May 30, 2009

From Webometrician to Web Analyst?

Filed under: web analyst,webometrics — admin @ 11:55 am

On 22nd July 2009 my job as web 2.0 research fellow at the University of Wolverhampton finishes. As the only other webometrics research post currently available is in South Korea, and I’m not really a 9-5 office type person, I will [probably] be going into business for myself: Commercialising webometrics. Unfortunately, as there are only a handful of people who know what webometrics is and what a webometrician would do, the hunt is on for a new job title.

The most obvious job title is ‘web analyst’, although the slightly wordier ‘web analytics consultant’ would probably give a better indication of the services I can offer. Neither, however, sound particularly cutting edge, exciting, or (like webometrician) rhyme with magician! Even after I have decided on a job title I will have to select names for the services I offer. Is ‘web impact analysis’ catchy enough? Naming children seems like a piece of cake in comparison.

One thing I am sure about: I will not be a search engine optimizer offering search engine optimization! Any other suggestions welcomed.

April 27, 2009

A Wolverhampton Network Diagram: It’s a local affair

Filed under: Wolverhampton,network diagram,webometrics — admin @ 1:37 pm

A couple of posts ago I was complaining about how annoying my job was as I tried to draw conclusions from the jumbled mess of environmental technology websites. Today’s post points out that it isn’t always such a jumbled mess.

I have just done a far smaller (and less scientific) data collection for a presentation I am doing in Wolverhampton tomorrow [click on picture to enlarge]:

It is a link diagram of a few web sites in Wolverhampton and the surrounding area to illustrate the sort of work my research group does.

What is noticeable from a webometric perspective is how many of the web sites included in the study are actually connected: you can link anywhere in the world, but the web is primarily a local affair.

April 22, 2009

I Hate My Job: The Web is Just a Jumbled Mess!

Filed under: network diagram,webometrics — admin @ 9:18 am

At the moment I am investigating the linking between 1337 environmental technology web sites. Of the 1337 sites, 751 nodes create one large network:

You spend days sorting a list of URLs, collecting data, finding errors, starting again…and at the end you just have a big ball of string.

A webometrician’s job is to draw conclusions from such a jumbled mess: I hate my job.

February 20, 2009

A Philosophy of Linking: Does The Pirate Bay need a webometrician?

Filed under: Bill Thompson,Pirate Bay,Theory of Linking,webometrics — admin @ 9:34 am

As members of The Pirate Bay stand trial Bill Thompson points out the need for a philosophy of linking:

The Pirate Bay case hinges on what counts as infringement, and whether simply linking to a site is enough to make someone liable, treating a hypertext link to a third-party URL as an endorsement, as something that makes a connection between two web pages or information sources that has real legal significance and weight.

Yet it is nothing of the sort. Ever since Tim Berners-Lee defined the Hypertext Markup Language and its Uniform Resource Locators one fundamental thing has applied – a link is just a link….

Perhaps we need a ‘philosophy of linkage’ to explore what the use of a link can signify, before the lawyers decide it for us and limit the creative potential of the web through their lack of imagination and understanding.

The theory of linking often comes up as a topic of conversation in webometrics, in much the same way as a theory of citation is discussed in bibliometrics. Unfortunately it often takes a back seat to those webometric areas with more obvious real-world applications, e.g., the creation of web indicators.

Only a couple of months ago a colleague and I started working on a ‘Theory of Linking’, but other work got in the way and the paper remains unfinished. Who knows, maybe if we had written the paper we could have been the first webometricians to be expert witnesses!

February 15, 2009

Twitter, Politics, and Looking for Meaningful Metrics

Filed under: Twitter,metrics,politics,twitometrics,webometrics — admin @ 11:57 am

As Twitter seems to be the latest shiny web site that has everyone interested, and with a general election on its way (well, June 2010 at the latest), I decided to see how the political parties have taken to Twitter.

The most simple comparison is between the raw numbers of the parties:
Obviously these numbers don’t look good for the Labour Party, not listening and not many followers. They don’t even have a single account, but rather two different streams with the same information.

Whilst such comparisons will be made with increasing regularity as the election approaches, for example:
…, we quickly realise we need to take into consideration a far wider variety of Twitter accounts and take into consideration other metrics.

@DowningStreet, the official Twitter channel for the office of the Prime Minister, provides a total different perspective on the Labour Party’s fortunes.
If @DowningStreet’s Twitter friends were an indication of support, Gordon could expect a landslide victory at the next general election. Unfortunately things are not that simple. As one comment to @DowningStreet shows, people follow for many different reasons:

any chance next week i can have a pic taken outside No.10? im visiting for a few days? i know its cheeky but i had to ask!

Obviously @DowningStree is not the only other UK political Twitterer, many individuals, groups and departments have accounts. All contributing to the complex picture of the UK political landscape.

Twitter potentially offers a lot of useful information about both the attitude of the parties to the electorate, and the electorate to the parties. Unfortunately, as with all webometric studies, for meaningful answers to be arrived at there needs to be distinct methodical steps rather than just a grabbing of raw data:
1) Select appropriate Twitter accounts to answer the research question.
2) Investigate Twitter interactions:
Not only ‘do they follow and have followers’, but are they ReTweeting comments and Responding to questions directed at them.
3) Investigate the nature of the interactions:
Unfortunately the simplest way of finding out the nature of many of the connection is to analyse the comments, a very long and tedious process.

As with so many things on the web, it would be interesting to investigate, if only one had the time.

February 5, 2009

An Unimpressive EThOS from the British Library

Filed under: British Library,EThOS,bibliometrics,webometrics — admin @ 8:45 am

One of the hundreds of posts in my feed-reader this morning was about the British Library electronic theses service (via SCIT blog). As my own thesis should be included I decided to indulge in a bit of vanity searching. Result: EThOS has a long way to go.

I would expect my thesis to turn up for the term ‘webometrics’, in fact it is about the only term for which someone might actually want to read it. Unfortunately the only webometric thesis belongs to Xuemei Li:

My thesis does however turn up for the wholly inappropriate ‘bibliometrics’:

Seemingly the reason for my appearance under ‘bibliometrics’ and not ‘webometrics’ is that ‘bibliometrics’ appears in my abstract whereas ‘webometrics’ does not. Whilst this may seem reasonable at first, theorectically the University of Wolverhampton are taking part in the project and their record includes a number of keywords carefully selected me, including ‘webometrics’. The British Library also fails to provide a link to my thesis, despite it being scattered over the web like confetti: “Not yet available for download”.

Young academics brought up on Google Scholar, with full text searching and links to the numerous copies on the web, are unlikely to see the value in EThOS and its traditional OPAC style. Whilst I’d like to see an electronic thesis online service that seperates the wheat from the chaff, with full text searching and links to the documents, and believe that librarians could aid in retrieval with classification of such documents, this is not what EThOS is currently offering. It’s still in Beta, and likely to improve, but it has a frighteningly long way to go and you do wonder whether they should have buddied up with one of the big search engines to produce a more user friendly version.

January 3, 2009

Webometric Word Clouds: an unscientific comparison

Filed under: Wordle,webometrics,word clouds — admin @ 5:14 pm

Whilst contemplating creating word clouds from search engine results(what else do people think about on a Saturday afternoons?) I started to wonder what my thesis would look like as a word cloud. More specifically, would it end up looking like the autobiography for Mike Thelwall? A quick copy and paste of 163 pages of text into Wordle later:

Maybe articles and theses should have a word cloud before the abstract to help users decide at a glance whether it is even worth reading the abstract.

How does my word cloud compare with other recent webometric theses?

November 21, 2008

Google SearchWiki: Cleaning up the Webometric results

Filed under: API,Google SearchWiki,webometrics — admin @ 11:53 am

For some reason Google always saves its big releases for those days when I am busy. Could it be that they are fearful of my criticism? Or merely coincidence? Whatever the reason I couldn’t help but push other things to one side and comment on Google’s new SearchWiki. Basically, when you are logged into your Google account at google.com (not currently google.co.uk) you can change the results you find on your home page: promoting results, hiding results, commenting on results. Whilst it only affects your results page, you can see how other people have ranked/commented on items, and it seems highly likely that Google will eventually incorporate the findings in its general search results.

SearchWiki is by no means a new idea, sites such Aftervote (now Scour) have done it all before, the difference this time is the amount of people Google can put to work on the idea. At the time of writing this blog a search for ‘Google’ had already had 908 people make notes; it would probably have taken Aftervote weeks if not months to get that many comments on a single search term. So what is the collective wisdom regarding the best search result for the term ‘google’ entered into Google.com…that’ll be Google.com. Personally I would have thought that people are more likely to be searching for one of Google’s other services or information about Google rather than the page they are already on, but noone ever accused the public of being overly bright.

As someone who likes to do his bit for collective wisdom, I have made steps to clean up one of my most regular search ‘webometrics’:

Just the three adjustments: promotion of the most important site, questioning the validity of a colleague’s page, and the removal of a character who has no right to call himself a webometrician. But I am sure everyone would agree that such amendments improve the page astronomically.

Whilst I am sure that shere weight of numbers will prevent the spamming of the top searches, it will be interesting to monitor the spam on the fringes. Will people be looking at the notes other have made? I will. SearchWiki seems as though it will give great insight into what people think of different sites, I just hope Google adds it to their API.

UPDATE: Whilst I initially said it was only available on Google.com, it’s seemingly not as simple as that. When I log into Google.com with my webometrics account I get SearchWiki, when I log in with my gmail account I don’t get SearchWiki! It seems as though they are taking steps to restrict access geographically.

October 28, 2008

Does Bibliometrics need a Blogger?

Filed under: bibliometrics,blogosphere,webometrics — admin @ 1:12 pm

Whilst searching on Google Blog Search for ‘webometrics’ I noticed that the usual webometric blogs are listed as ‘Related Blogs’:

As I had just been blogging on the subject of bibliometrics, I decided to see which the related blogs on that topic. Surprisingly there aren’t any:

[Although two blogs are 'related' to Scientometrics].

If blogs are a useful way for sharing the latest news and information in a particular discipline, as well as the promotion of a discipline, then surely bibliometrics would benefit from the odd bibliometrician blogging occasionally [...for the sake of inter-disciplinary relations I will eschew the joke about bibliometricians being odd]. Admittedly the webometric blogs are not the best example of academic blogging, but it is a burgeoning online community of sorts.

Older Posts »

Powered by WordPress