Skip to main content

Publicly query facebook using Python


In my cosc lab today a few students were asking about doing something "real" and "cool" with Python, something that isn't easy in excel. After a bit of a think I came to the conclusion that getting data from the internet is a "real" enough problem. As for "cool", since most people seem to have facebook open in the background during labs I thought getting some real live data off facebook could be interesting.
First a disclaimer or two.
  • Don't just run random code without reading it and satisfying yourself that its not trying to delete your operating system or do anything sinister!
  • I don't make the test or the assignment. This is not at all related, except for the fact it is using Python (okay it uses a for loop to iterate over a list, if nothing else in this post is of interest to you, learn how to do that!)
So if anyone is still reading, let me introduce the problem, then look into how we can solve it. We will take a quick look at facebook's graph api, then finally how it all ties together in a pretty short snippet of Python. There are two modules you probably haven't seen from the Python standard library so I'll briefly touch on those as well. 
Onwards to the problem: Lets say I'm very interested in comparing how many fans various public pages on facebook have. Maybe I am very pedantic and I check all the time, at least once an hour. But I really hate searching for the page on facebook each time to check how many fans there are. My tech friend told me about bookmarks, so I bookmarked each of the pages I check but it still takes me too long. All this other information is irrelevant to me... what I really want is to write a program that cuts through the crap so to speak and gives me the data. All I require for each page I'm interested in is the current number of fans.
Facebook has an application programming interface (API) called Graph API, basically it connects everything on facebook to anything else on facebook. For example, the official facebook page for the Facebook Platform has the id 19292868552, so you can fetch the object at https://graph.facebook.com/19292868552, alternatively if you know the username (and the page/user has one) you can fetch the object from https://graph.facebook.com/platform. If you clicked on those links you will notice the data is very similar to a python dictionary - this is data in the json format. If you know your own username on facebook, try see what is publicly known about you - just replace platform with your username (Note: since accessing private data on facebook requires secure authentication, we are just going to look at public pages). To see more about the facebook graph api go to http://developers.facebook.com/docs/api 
I imagine you have at least peeked at the code below by now, and line one should now make sense. If we want to decode json data aren't we lucky that Python has an inbuilt json module, lets use it!
Oh wait, before we can decode the json data we need to get access to it in Python. So just like in our earlier labs where you open a csv file, by calling open with a filename, we open a website by its url. The only function we use from urllib2 is urlopen, it takes an address (url) as its parameter, then instead of calling readlines like we did in lab 6, we call read. At this stage we have the data, but as a single large string.[1]
Line 7 creates a python dictionary out of the json data found at the address we specify. No I didn't know how to do that, I looked it up - the documentation is your friend. Google for "python json" and the first link will be the official documentation (including examples). Scan through that and you will find a function loads that will load python data from a string of json, and a function load that will load json from a file-like object. A url behaves very similar to a file, if we wanted we could call readlines() on an object created by urlopen, just like we could for a file object created with open.

import json
import urllib2

def load_facebook_page(facebook_id):
    '''Return a dictionary of data from a facebook page id or username'''
    addy = 'http://graph.facebook.com/' + facebook_id
    return json.load(urllib2.urlopen(addy))

def print_fans(facebook_page_id):
    '''Print the name and number of fans of a facebook page'''
    facebook_page = load_facebook_page(facebook_page_id)
    print facebook_page['name'], 'fans: ', facebook_page['fan_count']

page_ids = [ 'pythonlang', '62842406160', '63723325087' ]
for facebook_id in page_ids:
    print_fans(facebook_id


Hmm writing this description has taken about four or five times as long as writing the code! It really turned into an essay, opps. Luckily everything after line 8 should be very straight forward. Make up some problems and improvements. A simple one to start with would be printing out the address of the facebook page as well as its name and the number of fans.


[1] The module was called urllib but when it first got introduced people decided to write a new version that was so different from the original that they called it urllib2. In python 3 the old urllib is being thrown out and urllib2 becomes urllib - complicated much?

Popular posts from this blog

Matplotlib in Django

The official django tutorial is very good, it stops short of displaying
data with matplotlib - which could be very handy for dsp or automated
testing. This is an extension to the tutorial. So first you must do the
official tutorial!
Complete the tutorial (as of writing this up to part 4).

Adding an image to a view

To start with we will take a static image from the hard drive and
display it on the polls index page.
Usually if it really is a static image this would be managed by the
webserver eg apache. For introduction purposes we will get django to
serve the static image. To do this we first need to change the
template.



Change the template
At the moment poll_list.html probably looks something like this:


<h1>Django test app - Polls</h1> {% if object_list %} <ul> {% for object in object_list %} <li><a href="/polls/{{object.id}}">{{ object.question }}</a></li> {% endfor %} </ul> {% else %} <p>No polls are available.</p> …

My setup for downloading & streaming movies and tv

I recently signed up for Netflix and am retiring my headless home media pc. This blog will have to serve as its obituary. The box spent about half of its life running FreeNAS, and half running Archlinux. I’ll briefly talk about my experience with FreeNAS, the migration, and then I’ll get to the robust setup I ended up with.

The machine itself cost around $1000 in 2014. Powered by an AMD A4-7300 3.8GHz cpu with 8GB of memory. A SilverStone DS380 case is both functional, quiet and looks great. The hard drives have been updated over the last two years until it had a full compliment of 6 WD Green 4TiB drives - all spinning bits of metal though.

Initially I had the BSD based FreeNAS operating system installed. I had a single hard drive in its own ZFS pool for TV and Movies, and a second ZFS pool comprised of 5 hard drives for documents and photos.

FreeNAS is straight forward to use and setup, provided you only want to do things supported out of the box or by plugins. Each plugin is install…

Python and Gmail with IMAP

Today I had to automatically access my Gmail inbox from Python. I needed the ability to get an unread email count, the subjects of those unread emails and then download them. I found a Gmail.py library on sourceforge, but it actually opened the normal gmail webpage and site scraped the info. I wanted something much faster, luckily gmail can now be accessed with both pop and imap.

After a tiny amount of research I decided imap was the better albiet slightly more difficult protocol. Enabling imap in gmail is straight forward, it was under labs.

The address for gmail's imap server is:

imap.gmail.com:993

Python has a library module called imaplib, we will make heavy use of that to access our emails. I'm going to assume that we have already defined two globals - username and password. To connect and login to the gmail server and select the inbox we can do:

importimaplibimap_server=imaplib.IMAP4_SSL("imap.gmail.com",993)imap_server.login(username,password)imap_server.select(…