Counting birthday collisions using partitions
Rob Burns, Jen McKenzie

TL;DR
This paper introduces partition-based formulas to analyze the probability of s-collisions in the Birthday Problem, offering new mathematical tools for understanding birthday collision events.
Contribution
It presents novel partition-based formulas for counting s-collisions and related events in the Birthday Problem, advancing theoretical understanding.
Findings
Derived formulas for s-collision counts
Enhanced mathematical understanding of birthday collisions
Applicable to various forms of the Birthday Problem
Abstract
We use partitions to provide some formulae for counting s-collisions and other events in various forms of the Birthday Problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Combinatorial Mathematics · Advanced Mathematical Identities · Data Management and Algorithms
Counting birthday collisions using partitions
Rob Burns and Jen McKenzie
Abstract
We use partitions to provide some formulae for counting s-collisions and other events in various forms of the Birthday Problem.
1 Introduction
The standard Birthday Problem asks for the probability that at least two people in a group have the same birthday. Perhaps the most well known result in this area is that in a group of at least people the probability of at least two having the same birthday is around . In the general form of the problem the number of days in the year and the size of the group are treated as variables and the outcomes are studied under various constraints on and . The history of the problem is unclear. Both Richard von Mises in 1932 and Harold Davenport have been mentioned as initiators of the problem.
Some of the questions which have been discussed in the context of the Birthday problem are:
For a year having days, what is the minimum size of a group to ensure that the probability of at least two people in the group having the same birthday is ? As mentioned earlier, for a year having days a minimum of people are needed in order that the probability of two people having the same birthdays is at least . For large values of the size of the group needs to be \operatorname{O}\bigl{(}\sqrt{n}\bigr{)} in order for the probability of a common birthday to be . See, for example, [2], [7].
What is the minimum size of a group such that the expected number of common birthdays is at least ? For a year containing days, a group of or more is needed before the expected number of common birthdays is greater than . In general the group needs to be \operatorname{O}\bigl{(}\sqrt{n}\bigr{)} in order for the expected value of a common birthday to be at least .
In a group of people, what is the probability that everyone in the group shares a birthday with someone else in the group? This is known as the Strong Birthday Problem. In his survey article [8] DasGupta states that for a year having days, the group having members is the smallest such that the probability of everyone sharing a birthday is .
What can be said about the distribution of outcomes as the size of the group and the number of possible birthdays approach ? See for example [4], [3], [8], [10].
The problem arises in a number of scenarios and lends itself to many variations. It appears in cryptography in the form of what is called the ”Birthday Attack”. In this situation, messages are mapped to a hash table and for security reasons it is important to know how many messages need to be hashed before two are found with the same hash value (see e.g. [18], [19], [11], [15]).
The problem arises in the study of colourings of complete graphs ([6], [9]).
It is also related to the behaviour of certain Markov Chains ([5], [12], [14]).
In this paper we will picture the problem in terms of arranging balls inside tubes or buckets and counting various types of outcomes.
2 Terminology
Suppose we have numbered balls which are arranged inside numbered tubes. The tubes have the same width as the diameter of the balls so that when more than one ball is located within the same tube a column of balls forms. We denote the number of arrangements of balls into tubes by . The order of the balls within each tube is important and is taken into account when counting the number of arrangements.
We may also arrange the balls in buckets instead of tubes. For our purposes a bucket is a tube in which the order of the balls is not important. We denote the number of arrangements of balls into buckets by .
Whether dealing with tubes or buckets we will generally assume that the balls are numbered and therefore distinguishable. The case of indistinguishable balls, which are called bosons, has been studied. For example, the asymptotic behaviour of bosons as , approach was studied in [1] and [3]. Formulae obtained in sections 5 and 6 can be modified to apply to bosons.
Let with . We say an -collision has occurred when a tube (or bucket) contains or more balls. We will be counting the number of -collisions which occur in an arrangement of balls. We will therefore need to define the number of -collisions occurring when a tube contains balls with . In the literature two separate definitions have been used to count -collisions (see [4]). Under one definition a tube containing balls contributes collisions to the count of -collisions. The second definition states that a tube containing balls contributes collisions to the count of -collisions in an arrangement of balls. We will use the second definition is this paper but the formulae derived here can be easily altered to accomodate the first definition.
The floor function will be denoted in the usual way by .
3 Partitions
We will denote a general partition of a positive integer by the letter . is therefore a set of positive integers
[TABLE]
such that
[TABLE]
and
[TABLE]
Here is called the size or the number of parts of the partition . It may also be written as . depends on but we will not generally make that explicit.
The following lemma will not be used in this paper but is provided to show that the term which appears in some of the subsequent formulae is an integer.
Lemma 3.1**.**
Let and be a partition of . Then is divisible by . In addition is divisible by .
Proof.
We use the usual approach of taking an arbitrary prime and showing that the maximum power of dividing is greater than the maximum power dividing either of or . For an integer let denote the maximum power of dividing . We know that
[TABLE]
We have
[TABLE]
Since the forms a decreasing sequence, for fixed we have
[TABLE]
and since is an integer it follows that
[TABLE]
Therefore,
[TABLE]
The approach for is the same. We have
[TABLE]
[TABLE]
[TABLE]
∎
The above lemma shows that the tuple satisfies
[TABLE]
for every . We say that the tuple has an integral factorial ratio. This is an area of current research. See for example recent papers by Soundararajan [16], [17].
4 A commutative diagram
Each arrangement of balls in tubes can be mapped to a partition of by letting be the number of tubes holding at least one ball, be the number of tubes holding at least two balls etc. It is clear that defined in this way satisfies the definition of a partition of . We will call this mapping . In the same way, each arrangement of balls in buckets can be mapped to a partition by a map which we will call . Both mappings are many to one. If then the mappings are surjective, otherwise the common range of the mappings is the set of partitions such that .
Each arrangement of balls in tubes can be mapped to an arrangement of balls in buckets by ignoring the order of the balls in each tube. We will call this mapping . It is also many to one and surjective.
The mappings , and satisfy the identity
[TABLE]
This identity represents the fact that the partition associated with an arrangement of balls in tubes is the same partition associated with the arrangement in buckets obtained by ignoring the order of the balls in each tube.
For a set , we denote the number of elements in A by .
Lemma 4.1**.**
Let be a partition of . Then we have
[TABLE]
and
[TABLE]
Let be an arrangement of balls in buckets and denote the partition by . Then
[TABLE]
Proof.
For a fixed partition of there are ways of choosing the tubes containing at least one ball, ways of choosing tubes containing at least two balls etc. Therefore the number of ways that the tubes can be chosen so that the arrangement matches is
[TABLE]
For each choice of tube pattern there are ways of arranging the balls in the tubes so that the balls match the pattern. Equation (2) follows.
Let be an arrangement of balls in buckets and . Denote the number of balls in the -th bucket by . By definition, for each ,
[TABLE]
Then
[TABLE]
Equation (3) follows from equations (2) and (4) and the identity (1). ∎
5 Tubes and balls
In this section we present a formula for the number of arrangements of balls in tubes.
Theorem 5.1**.**
The number of arrangements of numbered balls in numbered tubes is given by the equation
[TABLE]
where the sum is over all partitions of such that .
Proof.
Each arrangement of balls in the tubes corresponds to a partition of via the mapping . The number of arrangements mapped to the same is given by equation (2). In order to count all possible arrangements we take the sum of over all partitions of resulting in equation (5). ∎
Theorem 5.1 can be used to obtain an expression for the number of arrangements having no -collision by restricting the sum to partitions of in which . Fairly simple formulae result when .
Corollary 5.2**.**
The number of arrangements of numbered balls in numbered tubes in which there are no -collisions is
[TABLE]
Proof.
In these arrangements all the balls lie on the bottom level of the tubes so the corresponding partition of must have . The only such partition is the trivial one given by . Equation (5) then reduces to the required formula. ∎
Corollary 5.3**.**
The number of arrangements of numbered balls in numbered tubes in which there are no -collisions is
[TABLE]
Proof.
In these arrangements all the balls lie on the bottom two levels of the tubes. We therefore have for the corresponding partitions of . Partitions satisfying this constraint are given by and for . The corollary follows from equation (5). ∎
Note that when all arrangements have at least one -collision. The expression in Corollary 5.3 still makes sense and sums to [math] when taking into account the convention that when .
Theorem 5.4**.**
Let . The total number of -collisions in all arrangements of numbered balls in numbered tubes is equal to
[TABLE]
where the sum is over all partitions of such that and .
Proof.
Any arrangement corresponding to a partition of with has no -collisions as all balls lie below the -th level of the tubes. We therefore only need to consider partitions with . We begin with equation (5). Each term in the sum in equation (5) is the number of arrangements corresponding to a particular partition . The number of -collisons is the same for each of arrangement having the same . We need to calculate the number of -collisions occurring for each of these partitions. As mentioned earlier, a tube containing balls contributes -collisions to the count. For the partition with there are tubes containing exactly balls, tubes containing exactly balls, , tubes containing exactly balls. Therefore, each with contributes
[TABLE]
-collisions to the total. Combining this with equation (5) yields equation (6). ∎
6 Buckets and balls
In this section we replace the tubes by buckets. The results from section 5 can be used with an appropriate adjustment to take into account that the balls are not ordered within each bucket. A number of closed form expressions have been published for the number of arrangements of balls in buckets satisfying various properties. For example, McKinney ([13]) provided an expression for the number of arrangements in which there is no -collision. Brink ([7] ) provided an exact formula for the least value of (in terms of ) such that the number of arrangements containing a -collision is at least a half of all arrangements.
The number of arrangements containing an -collision can also be calculated using a recursive formula. Such a formula was provided by Suzuki, Tonien, Kurosawa, and Toyota in the paper [19].
In this section we will use partitions to construct formulae for various Birthday events.
Theorem 6.1**.**
The number of arrangements of numbered balls in numbered buckets is given by the equation
[TABLE]
where the sum is over all partitions of such that .
Proof.
This follows from Theorem 5.1 and equation (4). ∎
Since we know that the number of arrangements of balls in buckets is ,we have
[TABLE]
When , the relevant partitions of in equation (8) are of the form
[TABLE]
Some algebra then produces the well known formula
[TABLE]
Corollary 6.2**.**
The number of arrangements of numbered balls in numbered buckets in which there are no -collisions is
[TABLE]
Proof.
The proof is the same as for corollary 5.2. ∎
The following corollary appears as equation (4) in DasGupta’s survey article [8].
Corollary 6.3**.**
The number of arrangements of numbered balls in numbered buckets in which there are no -collisions is
[TABLE]
Proof.
In these arrangements each bucket contains at most balls. We therefore have for the corresponding partitions of . Partitions satisfying this constraint are given by and for . For the partition we have
[TABLE]
The corollary follows from equation (7). ∎
Theorem 6.4**.**
Let . The total number of -collisions in all arrangements of numbered balls in numbered buckets is given by
[TABLE]
where the sum is over all partitions of such that and .
Proof.
This follows from Theorem 5.4 and equation (4). ∎
Subsets of arrangements can be counted using equation (7) by restricting the choice of partitions in the sum. For example, to count the number of arrangements in which at least buckets have an -collision, the sum in (7) should only include partitions such that . The Strong Birthday problem asks for the number of arrangements in which no bucket contains only one ball. This number is obtained from (7) by restricting to partitions such that .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Scott Aaronson and Alex Arkhipov. The computational complexity of linear optics. IN PROCEEDINGS OF STOC 2011 , 2011.
- 2[2] S.E. Ahmed and R.J. Mc Intosh. An asymptotic approximation for the birthday problem. Crux Math. , 26:151–155, 2000.
- 3[3] Alex Arkhipov and Greg Kuperberg. The bosonic birthday paradox. ar Xiv , ar Xiv:1106.0849:3, 2011.
- 4[4] R. Arratia, S. Garibaldi, and J. Kilian. Asymptotic distribution for the birthday problem with multiple coincidences, via an embedding of the collision process. Random Structures and Algorithms , 48(3):480–502, 2016.
- 5[5] Itai Benjamini and Ben Morris. The birthday problem and markov chain monte carlo. ar Xiv , ar Xiv:math/0701390:7, 2007.
- 6[6] Bhaswar B. Bhattacharya, Somabha Mukherjee, and Sumit Mukherjee. Birthday paradox, monochromatic subgraphs, and the second moment phenomenon. ar Xiv , ar Xiv:1711.01465:28, 2017.
- 7[7] David Brink. A (probably) exact solution to the birthday problem. The Ramanujan Journal , 28(2):223–238, Apr 2012.
- 8[8] Anirban Das Gupta. The matching, birthday and the strong birthday problem: a contemporary review. Journal of Statistical Planning and Inference , 130(1):377 – 389, 2005. Herman Chernoff: Eightieth Birthday Felicitation Volume.
