Webometric Thoughts

April 4, 2010

The 4am Project

Filed under: 4amproject — admin @ 6:16 pm

Social media is great for bringing a diverse set of people with similar interests together for a particular project. An excellent example of which has been Karen Strunks extremely successful 4am Project:

The aim of the 4amproject is to gather a collection of photos from around the world at the magical time of 4am. Everyone can take part and join in! All you need is a camera. We want to see what you see at that moment in time on that one day. What’s your view at 4am?

Obviously, as a man who needs at least eight hours sleep a night, my view is that I should have been fast asleep dreaming of unicorns or some such tosh. However, my girlfriend had other ideas. Although I contemplated sending her out into the night with hundreds of pounds worth of photographic equipment to confront the last drunken stragglers staggering home from the pubs and clubs of Wolverhampton, I knew everyone would blame me if she ended up mugged or dead in a ditch (however misplaced such blame would be).

It’s been about 15 years since I was serious about photography: with multiple lenses, filters, films, and access to a dark-room. 4am didn’t strike me as the best time to start again, so I went along purely in the role of observer – with the exception of ‘twitpicing’ a single photo from the worst camera-phone in the world at 4am:

The world is very different at 4am, and all in all it was a pleasant stroll around Wolverhampton’s West-End:

View 4am Project in a larger map
Without a doubt, the most interesting – and least tiring – part of the day has been watching some of the other pictures, posts, and films that have been put online throughout the day.

-Lee Allen’s video of other 4am participants in Wolves
-4am Project Flickr Group
..and of course…
-My girlfriend’s view of the world at 4am

March 18, 2010

Welcome StumbleUpon – and other members of my recent spike

Filed under: Stumbleupon,techcrunch — admin @ 10:57 am

Unsurprisingly, my Webometric Thoughts aren’t massively popular. There are few people who start the day checking the BBC, the Guardian, and then Webometric Thoughts. However, over the last few days my traffic has gone through the relative roof, from a steady 100 unique visitors a day, on Tuesday it leaped to 602!

Way beyond the previous high of 262. The reason: For a brief moment I was the TechCrunch pin-up boy, thanks to my (now-very-old) QR code T-shirt – nb. it goes without saying that this rather large company that clears $200,000 a month (according to Wikipedia) didn’t bother asking my penniless permission.

What’s particularly interesting is that hardly any of the traffic has come directly from TechCrunch, in fact only 112 of the visits over the last three days. Instead the traffic has been mostly a massive surge of visits to my home page from StumbleUpon. I’m not sure why, but nonetheless – Hello Stumbleupon Users *waves*

March 9, 2010

How bad is Chatroulette?

Filed under: Chatroulette,Mr Shifter — admin @ 10:15 pm

Everywhere I turn at the moment there seems to be a story about Chatroulette.com. Press a button and you are in a random video chat with a stranger somewhere else in the world. Unsurprisingly it is painted as the latest sign of the world going to hell in a handcart: “Who will protect the children?”

As a particularly unsocial social media researcher I decided to do a quick quantitative study of first impressions of the people I came across on the site: clothed or naked/obscene, male or female. As I didn’t particularly want to engage with anyone, but needed to put the web cam on to encourage the broadest cross-section, I set it up for Mr Shifter:

Out of 100 web cams in which the subject was identifiable.
79% were men.
5 contained more than one man.
11 were obscene.
10% were female.
2 contained more than one woman.
1 was obscene.
2% were mixed sex groups
9% were objects
- mostly signs saying “show me you boobs”.
In addition, I also came across one camera supposedly of a man who had just hung himself…I wasn’t too sure where to place that one.

So what did I find out? The world is mostly just looking to talk, there’s some weirdos out there, and one bloke who wanted to see the monkey dance…and was thrilled when he obliged.

Academic Search Engine Optimization: An inevitable evil?

Filed under: Academic SEO,REF,Scientific publishing — admin @ 2:01 pm

The money available for public science is finite, and it is understandable that governments want to get value for public money spent, and show the value in the form of bibliometric and webometric indicators. Unfortunately scientists are far from perfect, and the indicators and metrics that are meant to reflect the merits of an academic’s work can quickly become the focus of the academics work.

I’ve just finished reading Academic Search Engine Optimization (ASEO): Optimizing Scholarly Literature for Google Scholar & Co. (via @research_inform), which gives advice on making sure your journal articles are indexed and highly ranked by academic search engines (e.g., Google Scholar). There are numerous points I disagree with on both an ethical and a practical level:

  • “…tools that help in selecting the right keywords, Google Trends, Google Insights, Google Adwords”
  • “Synonyms of important keywords should also be mentioned a few times in the body of your text, so that the article may be found by someone who does not know the common terminology used in the research field.”

When I write an academic paper my primary audience is academics in my specialised field, not the wider public that are likely to use different vocabulary and dominate services like Google Trends by their shear numbers. As an academic reading a paper I wouldn’t appreciate the introduction of inconsistency and ambiguity through the use of synonyms, which are necessarily near-synonyms in the precise scientific world.

  • “..to achieve a good ranking in Google Scholar, many citations are essential. Google Scholar seems not to differentiate between self-citations and citations by third parties.”

Self citation has always been rife and needs little encouragement. Later they state that “…any articles you have read that relate to your current research paper should be cited“; although surely discretion is an important factor unless we are going to shoe-horn in crap and further exaggerate the Mathew effect of the high ranked papers.

  • “…publish the article on the author’s home page…an author who does not have a Web page might post the article on an institutional Web page”

Ignoring the curious turn of phrase, the general consensus is that the vast majority of academics should publish in their institutional repository irrespective of whether they have their own web site. The institutional repositories should have the procedures in place to ensure long-term archiving.

  • “…an article that includes outdated words might be replaced by either updating the existing article or publishing a new version on the author’s web site.”

As the authors acknowledge “…it may be considered misbehaviour by other researchers.” At last we have a point we agree on.

As you have probably guessed from the above criticisms, I thought that the article was a piece of crap. Academic SEO should in no way effect how you write an academic paper, or the subjects we choose to write about. Unfortunately academic SEO is a topic that is likely to get a lot more attention amongst bad scientists if another practice I recently heard of takes off: Paying academics bonuses per article. A colleague told me last week how his former university had a pot of money from which academics were paid €4,000 (split between the number of authors) for articles published in certain ‘quality’ journals. It is a small step to start paying individuals for articles that reach a certain threshold of citations, at which point we will have finally dumbed-down science.

“Researchers need to think seriously about how to get their articles indexed by academic search engines” – No, they need to think seriously about doing worthwhile research and writing quality publications. If your focus is on SEO then you are in the wrong field.

March 5, 2010

A quick SPARQL of Dbpedia.org says I’m past it!

Filed under: Linked Data,SPARQL,dbpedia — admin @ 5:33 pm

I’ve spent the last couple of days having a play around with some of the Linked Data that is increasingly being made available online – data that is made available through dereferencable URIs. One of the most interesting sources is Dbpedia.org, a project that extracts structured data from Wikipedia. Whilst it suffers from a lack of consistency, its crowd-sourced nature potentially offers unique insights into the nature of society (or at least the world as wikipedia users see it).

Today I downloaded a list of all the pages of people in dbpedia with dates of birth in the 20th century. Requests were sent using the SPARQL query language – with only one month requested at a time as dbpedia only provides the first 1,000 results for each query.

SELECT DISTINCT ?page ?dob {
?s foaf:page ?page.
?s ?dob .
Filter (?dob >= “1900-01-01″^^xsd:date) .
Filter (?dob <= "1900-01-31"^^xsd:date) . } Limit 1000

It’s not particularly surprising to find that in the current celebrity obsessed world there are more wikipedia-famous people towards the end of the century than at the beginning, and that there are relatively few people under the age of twenty.

At 35 it would seem as though my best years for getting my own wikipedia page are behind me – although as I was never counting on my sporting prowess, there is probably still a chance.

The real power of Linked Data comes not from these data sets in isolation, but investigating how they link together…but you have to start somewhere.

March 4, 2010

Microscopes and Micrographia

Filed under: Microscope,Robert Hooke,entomology — admin @ 9:52 am

My home office is increasingly turning into a home lab: circuit boards, sensors, switches, wires, wire cutters, soldering iron, even a robot. My latest acquisition is a USB digital microscope with 200x magnification. I’ve been tempted by the thought of a USB microscope for a while, and whilst there are more powerful microscopes out there, at £29.99 it would have been churlish not to give this one a go.

Unbeknown to the Maplin’s sales assistants, their sale was made that much easier by the fact I am currently reading Lisa Jardine’s The Curious Life of Robert Hook. The man who through his Micrographia (1665) showed the world at large the hidden details they had never seen before. Painstaking drawing by hand the objects he placed under his slides.

Today the man on the street can pick a USB miscroscope of the shelf, and within minutes share his close-ups of the world. It remains to be seen however, whether it will encourge a generation of entomologists, or navel gazers.

January 22, 2010

Semantic Webometrics – A few thoughts

Filed under: semantics,webometrics — admin @ 9:37 am

The other day an academic colleague asked what I was working on at the moment, in my answer I included – semantic webometrics – unsurprisingly he wanted some more detail. However ‘working on’ would be a bit of an exaggeration, ‘have a few ideas but nothing on paper yet’ would have been more appropriate. As such I thought I’d write down some of my rough thoughts on semantic webometrics.

For those who may have stumbled upon this blog from a non-webometric background, Webometrics as defined by Björneborn (2004), and as used by most of the webometrics community, means the:

…study of the quantitative aspects of the construction and use of information resources, structures and technologies on the Web drawing on bibliometric and informetric approaches.

Many of these quantitative studies have focused on hyperlinks. For example, investigating whether there is a correlation between a university’s inlinks (a.k.a. backlinks) and a university’s research ranking, or whether the interconnectedness of organisations in a region (as seen through interlinking web sites) can give an indication of a region’s level of innovation [outrageous self-citation].

One of the problems with many of these link-analyses is that they include a lot of noise. For example, when counting a university’s inlinks you will be counting both those from an academic highlighting a university’s quality research, and those from the disgruntled student highlighting his most hated tutor. Traditionally we have tried to understand the extent of this noise through large scale content analysis – the extremely tedious manual classification of web links and web pages.

The semantic web
A semantic web is one where information on the web is structured so that it is meaningful to computers. Well known examples of the semantic web include FOAF ontology allowing people to express the relationships with one another (e.g., the FOAF of Tim Berners-Lee) and the use of microformats for certain types of structured content including contact details (as included at www.davidstuart.co.uk) and reviews (which are now indexed by Google as Rich Snippets). This extra information information can be used to reduce the amount noise and enable meaningful webometric studies.

Semantic webometrics
So when I say semantic webometrics I mean – webometric studies that make use of the additional information included in an increasingly semantic web.

For example, a semantic webometic study of the connection between an institution’s inlinks and research ranking would take into consideration who had placed the links and the attributes that they had associated with them. A semantic webometric study of the relationships between organisations would look at the explicit relationships contained in FOAF files as well as the implicit information on web pages.

Unfortunately there is relatively little semantic information embedded in the majority of web pages/sites, and where it is widespread, e.g., with the nofollow link attribute, webometricians have yet to develop the tools to make use of them.

As such we need to take an information-centred approach to semantic webometric research rather than a problem-centred approach. Whilst still small, there is an increasing amounts of semantic data being embedded in the web all the time, webometricians need to investigate what is available and how they can use it.

January 4, 2010

Predictions: What are they good for?

Filed under: 2010 Predictions — admin @ 12:07 pm

At this time of year (or rather a few weeks ago if they weren’t drowning under a pile of work) technology bloggers all around the world make predictions about the coming year, and reflect upon the predictions they made the previous year. Looking back on my previous predictions I can’t help but realise how slowly the world of technology moves.

Last year’s predictions
1. N97 takes Nokia back to the top of the pile. Unfortunately I have only come across one person with an N97 in the past year, Apple and its apps continue to beguile everyone in their path.
2. Distributed social networks will shrink Facebook traffic. Unfortunately Google Wave launched too late in the year, and with too many problems, for it to make any real impact. But the notion of a distributed system has been well and truly planted in people’s minds.
3. Project Kangaroo will hit UK desktops.The legal watching of video online is increasing, with new entrants in the market such as Blinkbox, but unfortunately Project Kangaroo fell foul of the Competition Commission.
4. The general public continue to ignore QR codes. Despite my pessimism QR codes have actually started to creep into some unexpected places. For example, the University of Bath in numerous places, including their library catalogue. Whilst they have become more popular than I imagined, they are still ignored by most of the public.
5. No Google alternative will emerge. Yahoo Search closes up shop, Bing has more money than sense, and Google marches on.

This year’s predictions-On a similar theme
1. iPhone + Augmented Reality = Increased Market Share. I hate the iPhone because if you want to install anything on an iPhone you have to check it’s OK with Apple first, for which they will take 30% cut of the price of the app. Unfortunately the centralised app-store is the reason so many people like it. It simplifies the process of downloading new applications, and as we see an increase in glossy augmented reality mobile applications the iPhone will continue to be perceived as the obvious choice.
2. Google Wave takes off. Despite hating Google, I’m backing Google Wave for two reasons: i) We need something better than email, ii) I really want to see an open distributed system. It still has a lot of teething problems, but nothing that can’t be overcome.
3. Project Canvas fails. Project Kangaroo failed because of the complaints of Murdoch, and I’m sure Project Canvas will as well, especially if we see a Tory government after the next election.
4. No change in search. Market share will stay the same and no one will embrace the potential of the wisdom of the crowd. Search strikes me as one of the more antiquated areas of the web, with little real innovation occurring. I think things will start to change in 2011, if the semantic web takes a foothold this year.
5. The year of the Semantic Web. After years of talk, I have the feeling that this could be the one where we start to see the semantic web making an impact both through the opening up of large data sets, and the marking up of web pages with microformats. As someone who is fed up with poking and tweeting, I’m looking to the semantic web to inject a bit of life into the web.

As for Twitter, I don’t really care. I’m bored of it now.

January 3, 2010

2009 in Books: 47

Filed under: books — admin @ 2:36 pm

Whilst I have little doubt that the web is a wonderful thing, I personally waste a lot of time online reading half-formed, half-baked, off-the-cuff opinions. There are a lot of things that are better said in 300 pages than 140 characters. Unfortunately my mindless clicking online leaves far less room for books than I would like. At a minimum I would expect to read 50 books in a year, unfortunately (thanks to that ever encrouching web) 2009 saw me read a mere 47, or rather, finish 47 books; my shelves are littered with half-read books which if I return to I will feel it necessary to start again from the start.

The work related books: 16
‘Work’ can be stretched to cover a multitude of subjects that I am interested in, from sociology, through the narrative, to Second Life.

Unfortunately some of the work related books are far less enjoyable. Often (although not always) these were the ones that I had offered to review for a journal and therefore have to struggle through to the end.

Whilst some books are always worse than others, without a doubt Knowledge Networks: The Social Software Perspective (Premier Reference Source) was not only the worst book I read this year, but one of the worst publishing efforts I have ever seen.

Other non-fiction: 19
There isn’t much of a theme to the rest of my non-fiction, although I possible got a bit carried away with books about Samuel Johnson.

The one with least merit is The Impulse Factor: Why Some of Us Play it Safe and Others Risk it All; don’t even think about buying this book. The keen-eyed wondering what happened to book number 19, it was HOW TO USE BOOKS, I can only presume that it was the lack of picture that mean’t Amazon would let me add it to a widget.

The Fiction Books: 12
Curiously my fictional reads of 2009 both started and ended with an Adrian Mole, and there are the usual inclusion of personal favourites such as Grisham and Irving. But beyond that it is a curious selection of odds and ends.

Clumped together it looks a slightly bizarre collection, especially the fiction shelves (I believe Mr Majeika was free in a cereal box a previous year), but there again I suppose a lot of people’s do. As with every other year I shall resolve to read far more in 2010; maybe I should also resolve to read better books in 2010.

December 27, 2009

The Cat, the Bullfighter, and Google Books

Filed under: Erving Goffman,Google Books,frame breaking — admin @ 3:37 pm

As a general rule I take the web for granted. Although I’m old enough to remember [a lot of] life before the web, because I was aware of services like Prestel and had dialed up the local BBS years before, I merely saw the web as a natural progression.

Occassionally, however, an inconsequential event does make me stop and realise how much we really take for granted. Last night I was curled up with Erving Goffman’s Frame Analysis: An Essay on the Organization of Experience (as I’ve mentioned before his works provide userful frameworks for understanding the social web). On page 424 he uses an example of a cat ‘frame breaking’ as it circled the bull ring whilst the bullfighters were hiding behind barriers from the bull. Despite being nearly midnight I could go over to my computer, go to http://books.google.com/ and browse or search my way to the relevant issue of Life magazine to see the full-page picture of the event.

A book published in 1974, referenced a photograph published in 1955, and I could see that photo in a matter of moments. Something that would have been impossible for the vast majority of people who have ever read Frame Analysis over the past 35 years.

« Newer PostsOlder Posts »

Powered by WordPress