Recently Elsevier published its vision for the Article of the Future. However, whilst it paid attention to graphical abstracts and integrated audio and video, it failed to mention one of the most important aspects: delays in the publication process. I am joint author on a paper that has just been accepted by the Journal of Librarianship and Information Science, unfortunately it won’t be published until 2011!
Angus, E., Stuart, D., & Thelwall, M. (2011, in press).Flickr’s potential as an academic image resource: an exploratory study, Journal of Librarianship and Information Science.
Abstract Many web 2.0 sites are extremely popular and contain vast amounts of content, but how much of this content is useful in academia? This paper investigates the potential use of the popular web 2.0 image site Flickr as an academic image resource. The study identified images tagged with any one of 12 subject names derived from recognised academic subject categories in the three main ISI citation indexes. Image content analysis was used to determine the types of images found, and term-frequency analysis of associated tags was carried out to provide additional insights into the context behind image placement. The results show that Flickr can be used as a resource for subject-specific images in some subject areas; and that non subject-specific images can also prove to be of value for individual academics.
Whilst you won’t be able to see the final version for a couple of years, you can nonetheless download the pre-peer-reviewed version here [.doc format].
Application Programming Interfaces (APIs) are a brilliant way for researchers (as well as commercial developers) to use the data of the big web organisations in new and innovative ways in a controlled and ethical manner. Whilst there are usually limitations, we find ways of working within the boundaries we are set. What is annoying, however, is if you find that the service isn’t being particularly honest about the boundaries. This post’s wrath is aimed at Flickr’s API.
Whilst many API services will limit the number of results you can view, this is usually clearly set out in the documentation. For example, most search engines only allow you to view the first thousand results. Flickr however allows you to keep calling results, only to start sending back repeated pages of results for anything over 4,500. This can be clearly seen in the two pictures below from the Flickr API Explorer for flickr.photos.search. The first shows a partial screenshot of the results for the ninth page of 500 results for the tag ‘web’: The second shows a partial screenshot of the results for the tenth page of 500 results for the tag ‘web’: Basically the same results with a different page number.
I wouldn’t mind the restrictions if they were clear. Whilst it may be stated in the small print somewhere, which I still haven’t seen, why would you send the same data again and again and claim it as different pages of results? It is still possible to collect all the results by using some of the other arguments, e.g., min and max upload dates, it just means that I had to waste numerous hours collecting data again when the problem came to light. Flickr now owes me one Saturday.
This serves as a useful reminder to all web researchers: Make sure the API is giving you the data it is claiming to give you.
Python is a really simple programming language for the novice programmer. As such I held an afternoon’s “workshop” for a couple of PhD students in my front room: The aim of the workshop was to provide sufficient information about programming in Python so that at the end of the afternoon the user could: -Install Python libraries -Download information through various APIs -Manipulate the downloaded information. As it was necessary to create an extensive slide show, covering everything from installing Python to getting data from the Yahoo API, I thought it may potentially be of interest to other novice users who don’t know where to start.
It doesn’t necessarily include the quickest or most efficient way of doing things, but it is simple and does the job.
If you have any questions about specific points, feel free to ask…the questions can’t be more stupid than the questions the PhD students asked…and some of the slides could probably benefit from further explanation.
67% of Flickr members have no photos! Whilst Lotka’s law teaches us that the majority of contributors to a community make very few contributions, I was still surprised at the number of members with no photos; after all, I am not talking visitors to the site, but those who have taken the trouble to join. What is the point of joining Flickr if you are not going to put photos on the site?
Data was collected about the number of photos for 324 randomly selected users. 216 had no photos, an additional 58 had less than 20, with only 50 having over 20: Really I should have a look at whether these missing users are active in other ways, (e.g., members of groups, leavers of comments), but this was little more than an aside as I spend my time messing about with Python. I have now loaded Python on my main computer as well as my Eee PC, and can barely believe how easy it is!
Until yesterday I hadn’t really thought about programming on the Eee PC, but once I started looking I was surprised how easy it was: Unbeknown to me, it has had Python 2.4 and 2.5 sitting there the whole time! Despite not being a particularly competent programmer, I found Python to be very user friendly, and look forward to programming on the Eee PC in a variety of settings in the future. My first Python program was used to find random Flickr users:
> import flickrapi > import random > api_key = ‘XXXXXXXXXXXXXXX’ > for counter in range (1,1000): >>> flickr = flickrapi.FlickrAPI(api_key, fail_on_error=False) >>> a=random.randint(1, 99999999) >>> b=random.randint(0,1) >>> c=random.randint(0,9) >>> d=str(a)+”@N”+str(b)+str(c) >>> photos = flickr.photos_search(user_id=d) >>> if photos['stat'] == “ok”: >>>>>> print d > print ‘done’
Webometric studies are always searching for ways of finding random users, unfortunately I have no idea how Flickr assigns its user_ids. O’Reilly’s “Flickr Hacks’ says:
“…a string of numbers, followed by an at sign (@), an N, and two more numbers (often 00 or 01)…”
Not exactly specific. The program calculates a number up to eight digits long before the ‘@N’ and from 00 to 19 after the ‘@N’. Whilst most may be 00 or 01, I found them as high as 08. If anyone knows of any user_ids not included in these parameters, please let me know.
Sending 1,000 queries, 10 random users were identified. Not exactly efficient.
If, however, you (or your institution) are not a subscriber to Online Information Review, I have made a preprint available for your enjoyment HERE (Emerald has impressively liberal author publishing rights).
However much you may question the quality of many of the Flickr photos, and whether the vast majority of photos are worth the space they take up (however cheap it is), there is no getting away from the fact that 2 billion is massive number. The 2 billionth photo sums up so much of Flickr’s stuff, pleasant enough in a pseudo-arty fashion. Personally I would loved to have seen the 2 billionth photo to be a big fat man sitting in his pants with absolutely no artistic merit at all….but maybe it was and Flickr just juggled the figures a little bit for the momentous occasion…but who could blame them?