Skip to main content

The facebook graph

Putting "six degrees of separation" to the test with graph theory on a very large scale, Facebook analyses the connections between 10% of the worlds population.

Original paper: http://arxiv.org/abs/1111.4503

As one would expect the network is nearly fully connected, with 99.91% of the nodes belonging to a single connected component. The largest subgraph not connected to the major graph consists of about 2000 users.

The degrees of separation for any two facebook users is closer to 4 than 6. In 92% of all pairs of users on facebook; a friend of your friend knows a friend of my friend.

The graph is inherently sparse, duein part to our social ineptness at connecting with any significant fraction of the global population and also by facebook limiting the maximum number of friends to 5000. The median number of friends (degree) globally was 99.

The graph is also highly clustered - "for a median user, 14% of all their friend pairs are themselves friends", and interestingly this "drops rapidly for users with close to 5000 friends, indicating that these users are likely using Facebook for less coherently social purposes and friending users more indiscriminately."

The clique problem is to find the largest subset of people who all know each other, the study calculates this sparsity of the neighborhood graphs by measuring the degeneracy for each degree. Two examples highlight the findings of just how dense the local neighborhood graphs are:
* A node with degree 100 (a user with 100 friends) has an average degeneracy of 15, meaning on average 16 of the 100 friends each know at least 15 other mutual friends.
* Users with 500 friends have an average degeneracy of 53, they have at least 54 friends who all know 53 of their other friends.

The fact that neighborhood graphs center around sizable dense cores is clearly how facebook can offer such good friend suggestions and smart lists - I'm sure could be used to improve sharing between users major cliques (or ahem *circles*).



Pictured is the cumulative degree distribution showing the number of nodes (users) who have degree (number of friends) greater than k. The number ofunique friends added per additional friend was to approximate a linear line of best fit: 355*k-15057. So in theory I have ~205398 unique friends-of-friends.

Some other very interesting revelations: "until you have nearly 700 friends, your (average) neighbor has more friends than you""a positive correlation does exist between degree and logins... So since your friends have more friends than you do, they also login to Facebook more than you do."

While there are roughly 30 million fewer active female users on Facebook, a random neighbor of a user is slightly more likely to be female. Why? Because the average female degree (198) is larger than the average male degree (172).

All the path length crunching was done on a 24-core machine with 72 GiB of memory and 1 TiB of disk space, the edge graph comprised ~69 billion friend connections between 721 million users and took up 345 GB at 20 bits per arc and was traversed by a new algorithm called HyperANF.

Comments

Popular posts from this blog

Matplotlib in Django

The official django tutorial is very good, it stops short of displaying
data with matplotlib - which could be very handy for dsp or automated
testing. This is an extension to the tutorial. So first you must do the
official tutorial!
Complete the tutorial (as of writing this up to part 4).

Adding an image to a view

To start with we will take a static image from the hard drive and
display it on the polls index page.
Usually if it really is a static image this would be managed by the
webserver eg apache. For introduction purposes we will get django to
serve the static image. To do this we first need to change the
template.



Change the template
At the moment poll_list.html probably looks something like this:


<h1>Django test app - Polls</h1> {% if object_list %} <ul> {% for object in object_list %} <li><a href="/polls/{{object.id}}">{{ object.question }}</a></li> {% endfor %} </ul> {% else %} <p>No polls are available.</p> …

Homomorphic encryption using RSA

I recently had cause to briefly look into Homomorphic Encryption, the process of carrying out computations on encrypted data. This technique allows for privacy preserving computation. Fully homomorphic encryption (FHE) allows both addition and multiplication, but is (currently) impractically slow.

Partially homomorphic encryption just has to meet one of these criteria and can be much more efficient.
An unintended, but well-known, malleability in the common RSA algorithm means that the multiplication of ciphertexts is equal to the multiplication of the original messages. So unpadded RSA is a partially homomorphic encryption system.

RSA is beautiful in how simple it is. See wikipedia to see how to generate the public (e, m) and private keys (d, m).

Given a message x it is encrypted with the public keys it to get the ciphertext C(x)with:

C(x)=xemodm
To decrypt a ciphertext

Bluetooth with Python 3.3

Since about version 3.3 Python supports Bluetooth sockets natively. To put this to the test I got hold of an iRacer from sparkfun. To send to New Zealand the cost was $60. The toy has an on-board Bluetooth radio that supports the RFCOMM transport protocol.



The drive protocol is dead easy, you send single byte instructions when a direction or speed change is required. The bytes are broken into two nibbles: 0xXY where X is the direction and Y is the speed. For example the byte 0x16 means forwards at mid-speed. I was surprised to note the car continues carrying out the last given demand!

I let pairing get dealt with by the operating system. The code to create a Car object that is drivable over Bluetooth is very straight forward in pure Python:

importsocketimporttimeclassBluetoothCar:def__init__(self,mac_address="00:12:05:09:98:36"):self.socket=socket.socket(socket.AF_BLUETOOTH,socket.SOCK_STREAM,socket.BTPROTO_RFCOMM)self.socket.connect((mac_address,1))def_write(self,data_byte):…