Monday, December 20, 2010

The mathematics of a beat machine

2010 has been Swedish pop star Robyn's year. She's released a three-part album called Body Talk, been nominated to countless awards and broken new ground with her 3D music video with Twitter integration.

One of my favourite tracks on the second installment in the Body Talk series is We dance to the beat, the lyrics of which describe how we dance to the beat of all sorts of wonderful things, such as the beat of the continents shifting under our feet.

I'm a mathematician, so needless to say the line we dance to the beat of false math and unrecognized genius caught my attention. That line was one of the reasons that I listened to Body Talk pt. 2 over and over again (yes, I'm a bit of a geek). Luckily, the album is nothing short of amazing, just like parts 1 and 3, so the listening pleasure was far greater than what I would get simply from hearing a pop star mentioning mathematics.

In what must surely be a future textbook example of synchronicity, the words false math had been stuck in my head for a couple of days when the guys from the digital production company DinahMoe contacted me. As it turned out, We dance to the beat was in need of some real mathematics.

DinahMoe were developing a first of its kind interactive music experience in collaboration with Blip Butique. Released a week ago, it allows users to remix We dance to the beat and create a music video of sorts by combining sounds and visuals on the interactive beat machine site. The users search through a grid of video/sound clips and choose a combination of four videos and four sound clips. They can then interact with the clips to manipulate both audio and video, in perfect sync with the beat.

Users can share their remixes on Twitter or Facebook and look at the ever-growing seamless stream of remixes created by others. Every now and then the stream of remixes is interrupted by pre-programmed mixes with footage of Robyn, making the stream experience even cooler.

So, where do the mathematics enter into all of this? Well, there are 14 basic sound clips in the interactive beat machine and the creative team behind it wanted to construct a grid where each cell in the grid contained one clip. The users should be able find all possible combinations of 4 out of these 14 clips by looking at a subgrid with four cells (as in the picture above). The question, then, is how large the grid needs to be in order to fit all combinations and how the clips should be arranged so that no clip is ever paired with itself.

In order to answer this, we need to find out in how many ways we can choose 4 out of 14 clips. We could do this by listing all possible combinations (clips 1, 2, 3, 4, clips 1, 2, 3, 5, clips 1, 2, 3, 6, ... clips 11, 12, 13, 14), but that would (a) take too long and (b) bore us to death.

Not having a deathwish, I decided to make use of some tools from the mathematical branch of combinatorics. It just so happens that a classical combinatorial problem is the question "in how many different ways can we choose k out of n objects" (here k and n are whole numbers and k is less than or equal to n).

Luckily, others have solved this problem before us, and we can use their solution right away. The answer to the question can be written as what mathematicians arrogantly refer to as a "simple" formula, using what is called the binomial coefficient (the strange thing on the left hand side in the expression below). The formula is

which in this case, where n=14 and k=4 becomes (14*13*12*11)/(4*3*2*1)=1001. That is, there are 1001 possible ways to choose 4 out of 14 clips.

So, we need to create a grid where all 1001 combinations of four clips can be found by looking at a subgrid of size 2*2, that is, a grid with two rows and two columns. The clips much be placed in such a way that, for instance, clip number 1 never is found next to itself in the big grid - when the user looks at four cells there must be four different clips placed in them.

Consider the following small grids. In each of them there are two 2*2 subgrids that the user can look at.

In the grid to the left, the combinations of clips 1, 2, 4, 5 (the four cells to the left in the grid) and clips 2, 3, 5, 6 (the four cells to the right in the grid) are found. In the middle grid, the combinations are 1, 2, 4, 5 and 1, 2, 5, 6. In the grid to the right however, the combination 1, 1, 2, 4 is found, which isn't permitted. That's the kind of subgrids that we want to avoid.

Before we try to create the grid, we need to know how large it should be. How many subgrids of size 2*2 are there in a grid with a rows and b columns? Let's look at some examples:

From examples like this we can deduce that there is a general formula for the number of 2*2 subgrids: in a grid with a rows and b columns there are (a-1)*(b-1) subgrids of size 2*2. We need a grid with at least 1001 such subgrids in order to be able to fit all 1001 combinations in the grid.

It seemed reasonable to have a grid where a=b, that is, a grid with the same number of rows and columns. Then (a-1)*(b-1)=(a-1)*(a-1)=(a-1)2=a2-2*a+1. When a=32 we get that a2-2*a+1=961 and when a=33 we get that a2-2*a+1=1024, which is larger than 1001. Thus 33 is the smallest value of a that results in a grid where all combinations of 4 out the 14 clips can be placed.

In summary, we've found that there are 1001 combinations of 4 out 14 clips and that we need a grid that has at least 33 rows and 33 columns if we want all of these combinations to be found in 2*2 subgrids. Furthermore, we know what kind of subgrids we wish to avoid.

At this point, placing the clips in the grid is like solving a massive 33*33 sudoku with a strange set of rules. I did a bit of programming, using my favourite scripting language R, to let my computer do this for us. The resulting grid is one of the hidden cogs that is the magic behind the beat machine.

(To complicate things a bit, the grid we used is "infinite" in the sense that the left border of the grid fits with the right border, so that if you go 34 steps to the right or left, or 34 steps up or down you will come back to the same 4 clips that you started with.)

And then there is the matter of how to place the video clips in the grid... But that's another story.

Robyn recently performed at the Nobel peace prize concert. DinahMoe make awesomeness happen with interactive sounds and adaptive music. Check out their website for more examples!

Friday, December 17, 2010

Facebook friendships and urbanization

Last Tuesday Paul Butler, an intern on Facebook's data infrastructure engineering team, posted a map visualizing the "locality of friendship" on Facebook. Butler used data from friendships between the 500 million Facebook users to create a stunning visualization of the world's largest social network. The colour of the blue lines in represent friendship connections between cities and the white dots are the cities themselves (literally glowing from the density of the friendships within the city).

As interesting as the brightly shining cities and interconnecting lines are (think for a minute about what the lines represent!), many of us found the dark areas to be even more interesting. As is often the case, the most interesting questions came not from what we could see in the data set, but from what we couldn't see.

To that end, Thorsten Gätz produced a beautiful map where the Facebook friends map was put on top of a world map where areas with a high population density were marked in red. He described it as "[a] map which puts the connections of facebook users into context and shows where other social networks have the upper hand."

I am however not sure that population density is the right variable when trying to put the Facebook map into context. Parts of the third world are densely populated but not particularly urbanized. Urbanization should be highly correlated with social network usage, so perhaps that would be a more natural variable to look for in a comparison.

Ten years ago, NASA used data from meteorological satellites to produce a picture of Earth's city lights. Brighter areas are more urbanized, so this map should be useful when we try to put the Facebook map into context. Luckily, NASA's map uses the same projection as the Facebook map, so we can easily superimpose them on each other.

My graphical software skills are sadly lacking when compared to those of Gätz, but after trying my best I came up with this picture:

The red (and glowing) dots and lines are from the Facebook map whereas the blue parts come from NASA's map.

The conclusions that we can draw from this map differs somewhat from those drawn from Gätz's map. The most urbanized regions of Africa seems to coincide with the African regions on the Facebook map. Northeast Brazil, Russia and parts of eastern Europe, the middle east and China are either underrepresented or missing completely in the Facebook map. South Korea and Japan are clearly visible on the Facebook map, but even more so on the city lights map.

The picture that emerges when looking at the Facebook and urbanization maps together coincides with that from a world map of social network that was recently published over at Vincos blog, giving further support to the idea that urbanization is the right context for this map.

Dark areas of maps have always driven mankind to push further and explore the unknown. Today we can explore those dark areas from the comfort of our homes, with a little help of powerful computers and networks of satellites...

Revolutions and FlowingData drew my attention to Butler's Facebook map.

On a side note, it's also interesting to compare these pictures to world connection density and city-to-city router connections map.