Probability theory is the mathematics that helps us measure “chance” or “odds” of an event. It is truly, one of the most remarkable concepts of mathematics that perhaps came naturally to us, long before its equations and formulas were established formally by Blaise Pascal and Pierre de Fermat. Ever since its discovery, it is widely used in everyday lives to compare events that rely on pure chance, such as tossing a coin or winning a lottery.
When we consider a simple example, say, the odds of winning a lottery twice in a row, we get an intuitive grasp of why its value must be lower than a single win . We have both our intuition and mathematics supporting this result, but when it comes to larger and complicated problems, intuition and mathematics go their separate ways.
A classic example of this would be the Birthday Problem which gives a different result than what we might think it.
BIRTHDAY PARADOX – AN INTRODUCTION
How many times have we been a part of a classroom that consists of more than one person sharing the same birthday? All of us, at some point have encountered two such people in our lives, but perhaps never given a second thought as to why this is so frequent. The birthday problem explores the mathematics of such an occurrence, for a general n number of people.
For the sake of simplicity, consider the set of 365 calendar days, (excluding Feb 29th) to be the number of possible birthday’s anyone can have. In a class full of 366 people, we can confidently state using the Pigeonhole principle, that the probability of such an occurrence is 1.
For those of us who are not familiar with the Pigeonhole Principle, it states that if there are m pigeonholes, and if n pigeons were to be kept inside these such that n>m>0, there must atleast be one pigeonhole containing more than one pigeon. In simple words, if there are more pigeons than there are pigeonholes, there must at least be one pigeonhole with more than one pigeon.
Consider the example of 3 pigeonholes, and 4 pigeons. After putting 3 pigeons in 3 pigeonholes, we are compelled to put the remaining one in one of the already occupied pigeonholes. It is extremely simple, albeit a powerful principle that helps break down tough problems.
THE POWER OF COMBINED PROBABILITY
Now that we’ve established for values, greater than 365, P(n) = 1, where P is the event of two person sharing the same birthday in a given group of n members, let us discuss the value for low numbers such as n = 30.
We might be tricked into thinking , that the value for such numbers must be really small i.e. the odds of such events at n = 30 is very small. Analyzing the above situation using Pigeonhole principle, we could see that even if we assigned unique birthdays to all 30 people, we would still be left with 365-30= 335 such birthdays and that means a low value for P(n). Yet, the paradox lies in the fact that the real value of P(n) for n=30 is an astonishing 0.706 or 70.6 percent likely.
CORRECTION: In the above analysis, we cannot use Pigeonhole Principle as it holds only for values where n>m>0. (Here, n=30 and m =365)
To understand, why this is so, let us re-explore the wrongly applied Pigeonhole principle used for n=30. We’ve made a mistake in taking the number 30 at its face value and not understanding the underlying set of comparisons necessary..For a group of n members, the number of comparisons for which two birthdays mustn’t be same is the combination of all n members taking 2 at a time i.e. nC2. For n=30, this value is 30C2 = 435. In consider the group of n members as a whole, we are considering all possible two person relations between n members i.e. nC2 which significantly increases the probability of such an occurence in our favour.
In order to calculate the probability of finding the same birthdays in at least one pair of people in a given set, we will calculate the probability of every member to have unique birthdays and then complement it to have our answer.
P(n) is the probability of at least two members sharing the same birthday in a group of n members
Therefore, P'(n) is the probability of the n people to have unique birthdays.
i.e. P(n) = 1 – P'(n)
where P’ is the probability of n members to have distinct birthdays
To understand the above formula clearly, let us consider each member individually. The first member can have any of the 365 birthdays, member two on the other hand can have only one of 364 birthdays (excluding the one the first member already has). Member 3 can have only 363 and so on..
The above formula can be reformatted as follows :
SIGNIFICANCE OF THE PARADOX
One of the most important consequences of the “birthday problem” is the cryptography attack technique used by ethical hackers, is the “birthday attack” that exploits the high probability of collision between any two attack attempts from a fixed set of data/pigeonholes. As stated earlier, although the probability of finding two hashes/sequences, given that one of them is picked at will, is low, the total probability of “any” two inputs generating the same hash is high, contrary to what one’s intuition might suggest.