Classifying the web: Herding ADHD cats
When it comes to boring jobs I like to think I have had some of the worst: taking the shells off of hard boiled eggs, taking the green bits off of tomatoes, and, most recently, classifying web links. Yes, I can classify the links at home with a constant supply of coffee and the music of my choice, but it is still one of the most boring jobs. The reason: web pages come in ever imaginable form, mostly with no discernible purpose, with links placed just because the web owner can. Classifying the web is like herding ADHD cats.
The good and interesting sites that we visit every day are surrounded by a web of crap that we only usually trip across if we are unlucky. These are not necessarily offensive sites, just sites that are absolute rubbish: spam, half-formed, badly written, orphaned. Classifying the web means that we have to wallow in this web of crap. Its not like classifying a library of books, but rather like classifying a whole world of which 90% is the council rubbish tip.
Labels: classifying, link analysis, links, webometrics

