Webometric Thoughts

January 25, 2009

Flickr API: If you don’t want to give us the data, just tell us!

Filed under: API,Flickr — admin @ 12:27 pm

Application Programming Interfaces (APIs) are a brilliant way for researchers (as well as commercial developers) to use the data of the big web organisations in new and innovative ways in a controlled and ethical manner. Whilst there are usually limitations, we find ways of working within the boundaries we are set. What is annoying, however, is if you find that the service isn’t being particularly honest about the boundaries. This post’s wrath is aimed at Flickr’s API.

Whilst many API services will limit the number of results you can view, this is usually clearly set out in the documentation. For example, most search engines only allow you to view the first thousand results. Flickr however allows you to keep calling results, only to start sending back repeated pages of results for anything over 4,500. This can be clearly seen in the two pictures below from the Flickr API Explorer for flickr.photos.search. The first shows a partial screenshot of the results for the ninth page of 500 results for the tag ‘web’:

The second shows a partial screenshot of the results for the tenth page of 500 results for the tag ‘web’:

Basically the same results with a different page number.

I wouldn’t mind the restrictions if they were clear. Whilst it may be stated in the small print somewhere, which I still haven’t seen, why would you send the same data again and again and claim it as different pages of results? It is still possible to collect all the results by using some of the other arguments, e.g., min and max upload dates, it just means that I had to waste numerous hours collecting data again when the problem came to light. Flickr now owes me one Saturday.

This serves as a useful reminder to all web researchers: Make sure the API is giving you the data it is claiming to give you.

7 Comments »

  1. I ran into this and found your page while looking for commentary on it. Then in one of my experiments I noticed that going in through the normal web search interface, if I try to go to any page from 188 on up, I get a page saying no photos were found for that search criteria. 188*24 images per page is 4512…so it's not just the API.

    Comment by Anonymous — June 19, 2009 @ 1:08 am

  2. weaving@hypocritical.booted” rel=”nofollow”>.…

    good info!!…

    Trackback by arnold — November 18, 2014 @ 3:31 am

  3. extemporize@eventuality.marion” rel=”nofollow”>.…

    good!!…

    Trackback by joel — January 20, 2015 @ 2:11 am

  4. magazines@puddle.ol” rel=”nofollow”>.…

    ñïñ!…

    Trackback by luke — February 7, 2015 @ 4:29 am

  5. cliffhanging@oriental.ashen” rel=”nofollow”>.…

    thank you!…

    Trackback by max — February 7, 2015 @ 5:01 am

  6. hypothesizing@pecos.misconstruction” rel=”nofollow”>.…

    ñïñ!!…

    Trackback by terrance — February 7, 2015 @ 5:33 am

  7. screwed@bypass.particularistic” rel=”nofollow”>.…

    ñïàñèáî çà èíôó….

    Trackback by Mark — February 10, 2015 @ 10:17 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress