Thursday, February 10, 2011

Sharks! Sharks!

The International Shark Attack File, or ISAF for short, recently published their 2010 worldwide shark attack summary and the findings have been reported by international media the last couple of days. The main message has been that the number of unprovoked shark attacks last year, 79, was larger than usual.

The question that is left unanswered in the articles that I've read is "just how unusual is 79 shark attacks in a year?" Let's find out!

Assume that a trial is performed n independent times (where n is large) and that at each trial the probability p of some given event is small. If X is the number of trials in which the event occurs, then X will be Poisson distributed. This means that the probability that X equals some value k will be


where k!=k*(k-1)*(k-2)*...*3*2*1 and m=n*p is the average number of times that the event will occur. This fact has not only been seen empirically, but can also be proved using "basic" tools of probability theory.

Now, a large number of people, n, spend time in the sea each year, and for each such person there is a small probability, p, that the person is attacked by a shark. The people are more or less independent, and therefore we can argue that the number of shark attacks in a year should be (at least approximately) Poisson distributed.

For the mathematically minded, I should probably point out that I'm aware that the shark attack probabilities pi differ between different areas. This does not really contradict the assumption that shark attacks follow a Poisson process, as we can view the global shark attacks as the union of (essentially) independent Poisson processes with different intensities.

If we want to calculate the probability of a certain number of shark attacks, we now need to estimate the average number of shark attacks in a year, m. Well, from the ISAF 2000-2010 statistics we see that there's been an average of 71.5 shark attacks annually, or an average of 63.6 if we only look at the 2000-2009 period. Let's assume that the intensity of shark attacks has been constant throughout the last eleven years and that deviations from the mean are random and not due to some trend. In that case we can use the above averages as our estimates of m.

Using the estimate m=71.5 we get that the probability of at least 79 shark attacks in a year is 0.17, which means that we should expect more than 79 shark attacks roughly once in every six years. In the last eleven years there has been two such years, 2000 (80 attacks) and 2010 (79 attacks), which is more or less exactly what we would expect.

If we instead use the lower estimate m=63.6 we get that the probability is 0.026, which would mean that we can expect at least 79 shark attacks once in 38 years.

Were we to use the 2001-2009 data only, our estimate would be m=55.6 and the estimated probability of at least 79 sharks attacks would be as small as 0.001. In this scenario, 2010 would have been a one-in-a-thousand extreme when it comes to shark attacks!

There's actually a valuable lesson here. By choosing different years to include when calculating our estimate of m we arrived at completely different conclusions. It was easy for us to do in this example, and it's just as easy for anyone else to do when they want to present statistics. Lies, damn lies, ...

People tend to be afraid of sharks, and it is therefore interesting to note that out of the 79 sharks attacks last year, only 6 were fatal. According to the CIA World Factbook, 56.6 million people died in 2010. That means that the risk of being killed by a shark is approximately 0.0000001! That's a pretty abstract number, but maybe the "What's most likely to kill you?" infographic can help you visualize it. It illustrates the risks of various causes for death, but does unfortunately not include shark attacks... The ultimate shark infographic is probably this one, from last year.

Actually, this global shark catch graph tells us that the sharks are the ones who need to fear the humans.

On a side note, I teach a course called Statistics for engineers this semester and when I introduced the Poisson distribution two weeks ago, I used shark attacks as an example of an application of the distribution (along with some engineering applications, of course). I was inspired by this paper, from which I also borrowed a data set about the number of points Wayne Gretzky scored in each game during his time in Edmonton Oilers. Funnily, I gave the lecture on January 26, which was Gretzky's 50th birthday. When I introduced the binomial distribution I used the predictions of the 2010 FIFA world cup oracle Paul the Octopus as an illustrating example, who happened to be born on the 26th of January as well. This allowed me to move seamlessly on to the birthday problem ("what is the probability that there is at least one pair of people in this room that have the same birthday?"), which we could solve using the binomial and Poisson distributions. Sometimes Fortuna is on your side...

As I have more than 120 students in the course, the probability of at least on pair of people sharing a birthday was ridiculously close to 1.


  1. Just pick your favorite estimate for your facts. Unfortunately I still don't fancy swimming with sharks.. =P

    "There are three kinds of lies: lies, damned lies and statistics."

  2. Enroll yourself in the Data Science training online program and reach the epitome of success
    data science course in malaysia