Asymptotics of the overflow in urn models

Raul Gouet; Pawe{\l} Hitczenko; Jacek Weso{\l}owski

arXiv:1905.06663·math.PR·May 17, 2019·J. Appl. Probab.

Asymptotics of the overflow in urn models

Raul Gouet, Pawe{\l} Hitczenko, Jacek Weso{\l}owski

PDF

TL;DR

This paper investigates the asymptotic behavior of overflow counts in urn models with fixed capacities, extending previous work to general capacities and providing conditions for Poisson and normal limit distributions using probabilistic methods.

Contribution

It generalizes prior results on overflow asymptotics from capacity one to arbitrary capacities, offering new probabilistic conditions for different limit distributions.

Findings

01

Provides sufficient conditions for Poisson asymptotics.

02

Provides sufficient conditions for normal asymptotics.

03

Extends previous work from capacity one to general capacities.

Abstract

Consider a number, finite or not, of urns each with fixed capacity $r$ and balls randomly distributed among them. An overflow is the number of balls that are assigned to urns that already contain $r$ balls. When $r = 1$ , using analytic methods, Hwang and Janson gave conditions under which the overflow (which in this case is just the number of balls landing in non--empty urns) has an asymptotically Poisson distribution as the number of balls grows to infinity. Our aim here is to systematically study the asymptotics of the overflow in general situation, i.~e. for arbitrary $r$ . In particular, we provide sufficient conditions for both Poissonian and normal asymptotics for general $r$ , thus extending Hwang--Janson's work. Our approach relies on purely probabilistic methods.

Figures4

Click any figure to enlarge with its caption.

Equations280

N_{n, k} (m) = j = 1 \sum k - 1 I_{{X_{n, j} = m}},

N_{n, k} (m) = j = 1 \sum k - 1 I_{{X_{n, j} = m}},

Y_{n, k} = m \in M_{n} \sum I_{{X_{n, k} = m}} I_{{N_{n, k} (m) \geq r}}

Y_{n, k} = m \in M_{n} \sum I_{{X_{n, k} = m}} I_{{N_{n, k} (m) \geq r}}

V_{n, r} = k = 1 \sum n Y_{n, k} .

V_{n, r} = k = 1 \sum n Y_{n, k} .

p_{n}^{*} := m \in M_{n} sup p_{n, m} \mbox an d m \in M_{n} \sum p_{n, m}^{r + 1} .

p_{n}^{*} := m \in M_{n} sup p_{n, m} \mbox an d m \in M_{n} \sum p_{n, m}^{r + 1} .

P (N_{n, k} (m) = i) = (i k - 1) p_{n, m}^{i} q_{n, m}^{k - 1 - i}, i = 0, \dots, k - 1,

P (N_{n, k} (m) = i) = (i k - 1) p_{n, m}^{i} q_{n, m}^{k - 1 - i}, i = 0, \dots, k - 1,

N_{n, k}^{l} (m) = N_{n, l} (m) - N_{n, k} (m) = j = k \sum l - 1 I_{{X_{n, j} = m}},

N_{n, k}^{l} (m) = N_{n, l} (m) - N_{n, k} (m) = j = k \sum l - 1 I_{{X_{n, j} = m}},

P (N_{n, k} (m_{1}) \geq x_{1}, N_{n, k} (m_{2}) \geq x_{2}) \leq P (N_{n, k} (m_{1}) \geq x_{1}) P (N_{n, l} (m_{2}) \geq x_{2}) .

P (N_{n, k} (m_{1}) \geq x_{1}, N_{n, k} (m_{2}) \geq x_{2}) \leq P (N_{n, k} (m_{1}) \geq x_{1}) P (N_{n, l} (m_{2}) \geq x_{2}) .

P (N_{n, k} (m_{1}) \geq x_{1}, N_{n, k} (n_{1}) \geq y_{1}, N_{n, l} (m_{2}) \geq x_{2}, N_{n, l} (n_{2}) \geq y_{2}) \leq P (N_{n, k} (m_{1}) \geq x_{1}) P (N_{n, k} (n_{1}) \geq y_{1}) P (N_{n, l} (m_{2}) \geq x_{2}) P (N_{n, l} (n_{2}) \geq y_{2})

P (N_{n, k} (m_{1}) \geq x_{1}, N_{n, k} (n_{1}) \geq y_{1}, N_{n, l} (m_{2}) \geq x_{2}, N_{n, l} (n_{2}) \geq y_{2}) \leq P (N_{n, k} (m_{1}) \geq x_{1}) P (N_{n, k} (n_{1}) \geq y_{1}) P (N_{n, l} (m_{2}) \geq x_{2}) P (N_{n, l} (n_{2}) \geq y_{2})

P (N_{n, k} (m_{1}) \geq x_{1}, N_{n, l} (m_{2}) \geq x_{2}) \leq P (N_{n, k} (m_{1}) \geq x_{1}) P (N_{n, l} (m_{2}) \geq x_{2}) .

P (N_{n, k} (m_{1}) \geq x_{1}, N_{n, l} (m_{2}) \geq x_{2}) \leq P (N_{n, k} (m_{1}) \geq x_{1}) P (N_{n, l} (m_{2}) \geq x_{2}) .

m \in M_{n} \sum p_{n, m}^{r + 1} = E p_{X_{n}}^{r},

m \in M_{n} \sum p_{n, m}^{r + 1} = E p_{X_{n}}^{r},

Y_{n,j}=\mathbb{E}\Big{(}\tfrac{I_{\{X_{n,j}=X_{n}\}}}{p_{X_{n}}}I_{\{N_{n,j}(X_{n})\geq r\}}\Big{|}\mathcal{F}_{n,n}\Big{)}.

Y_{n,j}=\mathbb{E}\Big{(}\tfrac{I_{\{X_{n,j}=X_{n}\}}}{p_{X_{n}}}I_{\{N_{n,j}(X_{n})\geq r\}}\Big{|}\mathcal{F}_{n,n}\Big{)}.

\begin{split}\mathbb{E}\,(Y_{n,j}|\mathcal{F}_{n,k-1})&=\mathbb{E}\Big{(}\tfrac{I_{\{X_{n,j}=X_{n}\}}}{p_{X_{n}}}I_{\{N_{n,j}(X_{n})\geq r\}}\Big{|}\mathcal{F}_{n,k-1}\Big{)}\\ &=\mathbb{E}\Big{(}\mathbb{E}\Big{(}\tfrac{I_{\{X_{n,j}=X_{n}\}}}{p_{X_{n}}}I_{\{N_{n,j}(X_{n})\geq r\}}\Big{|}X_{n},\mathcal{F}_{n,j-1}\Big{)}\Big{|}\mathcal{F}_{n,k-1}\Big{)}\\ &=\mathbb{E}\Big{(}\mathbb{E}\Big{(}\tfrac{I_{\{X_{n,j}=X_{n}\}}}{p_{X_{n}}}\Big{|}X_{n},\mathcal{F}_{n,j-1}\Big{)}\,I_{\{N_{n,j}(X_{n})\geq r\}}\Big{|}\mathcal{F}_{n,k-1}\Big{)}\\ &=\mathbb{E}(I_{\{N_{n,j}(X_{n})\geq r\}}|\mathcal{F}_{n,k-1}).\end{split}

\begin{split}\mathbb{E}\,(Y_{n,j}|\mathcal{F}_{n,k-1})&=\mathbb{E}\Big{(}\tfrac{I_{\{X_{n,j}=X_{n}\}}}{p_{X_{n}}}I_{\{N_{n,j}(X_{n})\geq r\}}\Big{|}\mathcal{F}_{n,k-1}\Big{)}\\ &=\mathbb{E}\Big{(}\mathbb{E}\Big{(}\tfrac{I_{\{X_{n,j}=X_{n}\}}}{p_{X_{n}}}I_{\{N_{n,j}(X_{n})\geq r\}}\Big{|}X_{n},\mathcal{F}_{n,j-1}\Big{)}\Big{|}\mathcal{F}_{n,k-1}\Big{)}\\ &=\mathbb{E}\Big{(}\mathbb{E}\Big{(}\tfrac{I_{\{X_{n,j}=X_{n}\}}}{p_{X_{n}}}\Big{|}X_{n},\mathcal{F}_{n,j-1}\Big{)}\,I_{\{N_{n,j}(X_{n})\geq r\}}\Big{|}\mathcal{F}_{n,k-1}\Big{)}\\ &=\mathbb{E}(I_{\{N_{n,j}(X_{n})\geq r\}}|\mathcal{F}_{n,k-1}).\end{split}

E (Y_{n, k} ∣ F_{n, k - 1}) = P (N_{n, k} (X_{n}) \geq r ∣ F_{n, k - 1}) = P (N_{n, k} (X_{n}) \geq r ∣ F_{n, n}) .

E (Y_{n, k} ∣ F_{n, k - 1}) = P (N_{n, k} (X_{n}) \geq r ∣ F_{n, k - 1}) = P (N_{n, k} (X_{n}) \geq r ∣ F_{n, n}) .

E Y_{n, j} = E P (N_{n, j} (X_{n}) \geq r ∣ F_{n, k - 1}) = E i = r \sum j - 1 (i j - 1) p_{X_{n}}^{i} q_{X_{n}}^{j - 1 - i},

E Y_{n, j} = E P (N_{n, j} (X_{n}) \geq r ∣ F_{n, k - 1}) = E i = r \sum j - 1 (i j - 1) p_{X_{n}}^{i} q_{X_{n}}^{j - 1 - i},

E (E (Y_{n, k} ∣ F_{n, k - 1}) E (Y_{n, l} ∣ F_{n, l - 1})) = E (P (N_{n, k} (X_{n}) \geq r ∣ F_{n, n}) P (N_{n, l} (Y_{n}) \geq r ∣ F_{n, n}))

E (E (Y_{n, k} ∣ F_{n, k - 1}) E (Y_{n, l} ∣ F_{n, l - 1})) = E (P (N_{n, k} (X_{n}) \geq r ∣ F_{n, n}) P (N_{n, l} (Y_{n}) \geq r ∣ F_{n, n}))

E (E (Y_{n, k} ∣ F_{n, k - 1}) E (Y_{n, l} ∣ F_{n, l - 1})) = P (N_{n, k} (X_{n}) \geq r, N_{n, l} (Y_{n}) \geq r) .

E (E (Y_{n, k} ∣ F_{n, k - 1}) E (Y_{n, l} ∣ F_{n, l - 1})) = P (N_{n, k} (X_{n}) \geq r, N_{n, l} (Y_{n}) \geq r) .

C \mbox o v (E (Y_{n, k} ∣ F_{n, k - 1}), E (Y_{n, l} ∣ F_{n, l - 1})) = C \mbox o v (I_{{N_{n, k} (X_{n}) \geq r}}, I_{{N_{n, l} (Y_{n}) \geq r}}) .

C \mbox o v (E (Y_{n, k} ∣ F_{n, k - 1}), E (Y_{n, l} ∣ F_{n, l - 1})) = C \mbox o v (I_{{N_{n, k} (X_{n}) \geq r}}, I_{{N_{n, l} (Y_{n}) \geq r}}) .

n^{r + 1} E p_{X_{n}}^{r} \to (r + 1)! μ

n^{r + 1} E p_{X_{n}}^{r} \to (r + 1)! μ

n p_{n}^{*} \to 0,

n p_{n}^{*} \to 0,

\frac{n ^{r + 1}}{m _{n}^{r}} \to (r + 1)! μ \Rightarrow V_{n, r} \to d Pois (μ) .

\frac{n ^{r + 1}}{m _{n}^{r}} \to (r + 1)! μ \Rightarrow V_{n, r} \to d Pois (μ) .

n^{r + 1} E p_{X_{n}}^{r} = \frac{( n p _{n} ) ^{r + 1}}{1 - ( 1 - p _{n} ) ^{r + 1}} .

n^{r + 1} E p_{X_{n}}^{r} = \frac{( n p _{n} ) ^{r + 1}}{1 - ( 1 - p _{n} ) ^{r + 1}} .

1 \leq k \leq n max E (Y_{n, k} ∣ F_{n, k - 1}) \to P 0,

1 \leq k \leq n max E (Y_{n, k} ∣ F_{n, k - 1}) \to P 0,

k = 1 \sum n E (Y_{n, k} ∣ F_{n, k - 1}) \to P η

k = 1 \sum n E (Y_{n, k} ∣ F_{n, k - 1}) \to P η

k = 1 \sum n E (Y_{n, k} I_{{∣ Y_{n, k} - 1∣ > ϵ}} ∣ F_{n, k - 1}) \to P 0,

k = 1 \sum n E (Y_{n, k} I_{{∣ Y_{n, k} - 1∣ > ϵ}} ∣ F_{n, k - 1}) \to P 0,

n^{s} E p_{X_{n}}^{s} \to 0

n^{s} E p_{X_{n}}^{s} \to 0

n^{s + 1} E p_{X_{n}}^{s} \to 0, s > r .

n^{s + 1} E p_{X_{n}}^{s} \to 0, s > r .

n^{s + 1} E p_{X_{n}}^{s} \leq (n p_{n}^{*})^{s - r} n^{r + 1} E p_{X_{n}}^{r} .

n^{s + 1} E p_{X_{n}}^{s} \leq (n p_{n}^{*})^{s - r} n^{r + 1} E p_{X_{n}}^{r} .

i = m \sum n (i n) p^{i} (1 - p)^{n - i} \leq \frac{( n p ) ^{m}}{m !} .

i = m \sum n (i n) p^{i} (1 - p)^{n - i} \leq \frac{( n p ) ^{m}}{m !} .

P (B_{n + 1} \geq m) = P (B_{n} \geq m - 1) p + P (B_{n} \geq m) (1 - p) \leq \frac{( n p ) ^{m - 1}}{( m - 1 )!} p + \frac{( n p ) ^{m}}{m !} (1 - p) \leq \frac{(( n + 1 ) p ) ^{m}}{m !},

P (B_{n + 1} \geq m) = P (B_{n} \geq m - 1) p + P (B_{n} \geq m) (1 - p) \leq \frac{( n p ) ^{m - 1}}{( m - 1 )!} p + \frac{( n p ) ^{m}}{m !} (1 - p) \leq \frac{(( n + 1 ) p ) ^{m}}{m !},

\mathbb{E}\Big{(}\sum_{k=1}^{n}\,\mathbb{E}\,(Y_{n,k}|\mathcal{F}_{n,k-1})-\mu\Big{)}^{2}\to 0.

\mathbb{E}\Big{(}\sum_{k=1}^{n}\,\mathbb{E}\,(Y_{n,k}|\mathcal{F}_{n,k-1})-\mu\Big{)}^{2}\to 0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Asymptotics of the overflow in urn models111This material is based upon work supported by and while serving at the National Science Foundation. Any opinion, findings, and conclusions or recommendations expressed in this material are

those of the authors and do not necessarily reflect the views of the National Science Foundation. 222Part of the research by the last two authors was carried out while they visited the Center for Mathematical Modeling at the University of Chile. They would like to thank the first author for arranging the visits and his hospitality, and the CMM for a generous support.

Raul Gouet Supported by grants PIA AFB-170001 and Fondecyt 1161319 Departamento de Ingenieria Matemática and CMM (UMI 2807, CNRS), Universidad de Chile

Paweł Hitczenko On leave from Drexel University Division of Mathematical Sciences, National Science Foundation

Jacek Wesołowski Supported by the grant 2016/21/B/ST1/00005 of National Science Centre, Poland Faculty of Mathematics and Information Science, Warsaw University of Technology

Abstract

Consider a number, finite or not, of urns each with fixed capacity $r$ and balls randomly distributed among them. An overflow is the number of balls that are assigned to urns that already contain $r$ balls. When $r=1$ , using analytic methods, Hwang and Janson gave conditions under which the overflow (which in this case is just the number of balls landing in non–empty urns) has an asymptotically Poisson distribution as the number of balls grows to infinity. Our aim here is to systematically study the asymptotics of the overflow in general situation, i. e. for arbitrary $r$ . In particular, we provide sufficient conditions for both Poissonian and normal asymptotics for general $r$ , thus extending Hwang–Janson’s work. Our approach relies on purely probabilistic methods.

Keywords and phrases: Urn model; occupancy problem; random allocations; weak limit theorems

MSC 2010 subject classifications: Primary 60F05, 60K30; secondary 60K35

1 Introduction

Urn models are one of the fundamental objects in classical probability theory and they have been studied for a long time in various degrees of generality. We refer the reader to classical sources [Johnson and Kotz (1977), Kolchin et al. (1978), Kotz and Balakrishnan (1997), Mahmoud (2009)] for a complete account of the theory and discussions of different models, and to e. g. [Gnedin et al. (2007), Hwang and Janson (2008), Bobecka et al. (2013)] for some of the more recent developments. Perhaps the most heavily studied characteristic is the number of occupied urns after $n$ balls have been thrown in. One reason for this is that it is often interpreted as a measure of diversity of a given population. Actually, more refined characteristics, e. g. the number of urns containing the prescribed number of balls, have been subsequently studied for various urn models. In diversity analysis, the number $M_{k}$ of urns with exactly $k$ balls, is called abundance count of order $k$ . In particular, the popular estimator of species richness, called Chao estimator, is based on $M_{1}$ and $M_{2}$ (with a more sophisticated version using also $M_{3}$ and $M_{4}$ ) - see e. g. [Chao and Chiu (2016)]. In [Hwang and Janson (2008)] the authors used analytical methods based on Poissonization and de–Poissonization to prove that the number of empty urns is asymptotically normal as long as its variance grows to infinity (this is clearly the minimal requirement). As a by–product of their method they established the Poissonian asymptotics of the number of balls that fall into non–empty urns when the variance is finite and under additional assumptions on the distribution among boxes. We mention in passing that the number of balls falling into non–empty urns is sometimes called the number of collisions. Under the uniformity assumption for the distribution of balls it has been used, for example, for testing random number generators (see [Knuth (1998), vol. 2, §3.3.2 I] for more details). We refer also to [Arratia et al. (2016)] and references therein for another illustration of how this concept is used, e.g. in cryptology.

Our main aim here is to extend the result of Hwang and Janson by considering the number of balls falling into urns containing at least $r$ balls (thus, their result corresponds to $r=1$ ). Relying on purely probabilistic methods we provide sufficient conditions for both Poissonian and normal asymptotics for the number of balls falling into such urns.

One way to formulate the problem is as follows. There is a collection (possibly infinite) of distinct containers in which balls are to be inserted. All containers have the same finite capacity. Each arriving ball is to be placed in one of the containers, randomly and independently of other balls. However, if the container selected for a given ball is already full, the ball lands in the overflow basket. We are interested in the number of balls in that basket when more and more balls appear. The notion of the overflow is not entirely new and has appeared, for example, in the context of collision resolution for hashing algorithms, see a discussion in section: “External searching” in [Knuth (1998), vol. 3, §6.4]. We also refer to subsequent work [Ramakrishna (1987), Monahan (1987)] for the computation of the probability that there is no overflow (under the uniformity assumption), and to [Dupuis et al. (2004)] which, in part, concerns the estimation of the probability of unusually large overflow. As far as we are aware, however, asymptotic behavior of the overflow has not been systematically investigated.

More precisely, we consider the following model: For any $n\geq 1$ , let $X_{n,1},\ldots,X_{n,n}$ be iid rv’s with values in $M_{n}\subset{\mathbb{N}}:=\{1,2,\ldots\}$ and let $p_{n,m}=\mathbb{P}(X_{n,1}=m),m\in M_{n}$ , be the common distribution among the boxes for each of the $n$ balls in the $n$ th experiment. Let also

[TABLE]

for any $n\in{\mathbb{N}}$ , $k\in\{1,\ldots,n,n+1\}$ and $m\in M_{n}$ , where $I_{\{\cdot\}},$ denotes the indicator of the events within brackets. That is $N_{n,k}(m)$ is the number of balls among first $k-1$ balls for which the $m$ th box was selected.

Let $r$ be a given positive integer, which denotes the (same) capacity of every container. Then

[TABLE]

is 1 if the $k$ th ball lands in the overflow, and is 0 otherwise. Naturally, $Y_{n,k}=0$ for $k=1,\ldots,r$ . Consequently, the size of the overflow, denoted $V_{n,r}$ , can be written as

[TABLE]

We are interested in the asymptotic distribution of $V_{n,r}$ , as $n\to\infty$ . We will show that there are regimes relating $(p_{n,m})_{m\in M_{n}}$ and $n\to\infty$ under which the limiting distribution of $V_{n,r}$ (possibly standardized) is either Poisson or normal. These regimes will be defined through the limiting behavior of

[TABLE]

Actually, we impose assumptions on $\lim\limits_{n\to\infty}\,np_{n}^{*}$ and $\lim\limits_{n\to\infty}\,n^{r+1}\,\sum_{m\in M_{n}}\,p_{n,m}^{r+1}$ .

1.1 Multinomial distribution and negative association

Note that, for distinct $m_{1},\ldots,m_{s}\in M_{n}$ and any $k=1,\ldots,n$ , $(N_{n,k}(m_{1}),\ldots,N_{n,k}(m_{s}))$ has multinomial distribution $\mathrm{Mn}_{s}(k-1;p_{n,m_{1}},\ldots,p_{n,m_{s}})$ . In particular, $N_{n,k}(m)$ has the binomial distribution ${\operatorname{Bin}}(k-1,p_{n,m})$ , that is,

[TABLE]

where $q_{n,m}=1-p_{n,m}$ . Also, let

[TABLE]

for $k<l$ , and $N_{n,k}^{l}(m)=0$ , for $k\geq l$ . Then, for distinct $j_{1},\ldots,j_{t}\in M_{n}$ and $k<l$ , $(N_{n,k}^{l}(j_{1}),\ldots,N_{n,k}^{l}(j_{t}))$ has multinomial distribution $\mathrm{Mn}_{t}(l-k;p_{n,j_{1}},\ldots,p_{n,j_{t}})$ . Moreover, vectors $(N_{n,k}(m_{1}),\ldots,N_{n,k}(m_{s}))$ and $(N_{n,k}^{l}(j_{1}),\ldots,N_{n,k}^{l}(j_{t}))$ are independent. Further, it is well known that multinomial random variables are negatively orthant dependent (NOD), that is, for $m_{1}\neq m_{2}$

[TABLE]

As such they are also negatively associated (NA) - see [Joag-Dev and Proschan (1983)] for the definition and basic properties $P_{1},\ldots,P_{7}$ .

In particular, both sets $N_{n,k}(m_{1}),\ldots,N_{n,k}(m_{t})$ and $N_{n,k}^{l}(j_{1}),\ldots,N_{n,k}^{l}(j_{t})$ are NA and, by property $P_{7}$ , the combined set of $N_{n,k}$ and $N_{n,k}^{l}$ variables is also NA. In particular, by $P_{4}$ , for distinct $m_{1},m_{2},n_{1},n_{2}$ , the subset $N_{n,k}(m_{1})$ , $N_{n,k}(n_{1})$ , $N_{n,k}(m_{2})$ , $N_{n,k}(n_{2})$ , $N_{n,k}^{l}(m_{2})$ $N_{n,k}^{l}(n_{2})$ is NA as well. Finally, noting that $N_{n,l}(m)=N_{n,k}(m)+N_{n,k}^{l}(m)$ we conclude by $P_{6}$ that $N_{n,k}(m_{1})$ , $N_{n,k}(n_{1})$ , $N_{n,l}(m_{2})$ , $N_{n,l}(n_{2})$ are NA.

Consequently, the following extended versions of the NOD property (3) hold:

[TABLE]

and, taking $y_{1}=y_{2}=0$ in (4),

[TABLE]

1.2 Auxiliary random variables

We find it convenient to introduce sequences of random variables $(X_{n})$ and $(Y_{n})$ such that, for any $n\in{\mathbb{N}}$ , the random variables $X_{n},Y_{n},X_{n,1},\ldots,X_{n,n}$ are iid. This allows, in general, to simplify expressions because sums over $m\in M_{n}$ can be represented as expectations and computations are compactly carried out by means of conditional expectations. For example,

[TABLE]

where here and everywhere below we write $p_{X_{n}}$ for $p_{n,X_{n}}$ .

Let $\mathcal{F}_{n,k}=\sigma(X_{n,1},\ldots,X_{n,k})$ be the $\sigma$ -algebra generated by $X_{n,1},\ldots,X_{n,k}$ , for $k=1,\ldots,n$ , and note that $N_{n,j}(m)$ is $\mathcal{F}_{n,k}$ -measurable, for any $m\in M_{n},k\geq j-1$ . Note also that, for any $n,k$ , $X_{n}$ is independent of $\mathcal{F}_{n,k}$ . Then $Y_{n,j}$ can be written as

[TABLE]

So, for $j\geq k$ ,

[TABLE]

Hence, $\mathbb{E}\,(Y_{n,j}|\mathcal{F}_{n,k})=\mathbb{E}(I_{\{N_{n,j}(X_{n})\geq r\}}|\mathcal{F}_{n,k})$ , for $j>k$ , and $\mathbb{E}\,(Y_{n,k}|\mathcal{F}_{n,k})=Y_{n,k}$ .

Note that representation (7) implies

[TABLE]

Taking expectations of both extremes of (7) we get

[TABLE]

where $q_{X_{n}}=1-p_{X_{n}}$ . Furthermore, for $k,l=1\ldots,n$ , (8) yields

[TABLE]

and, because $N_{n,k}(X_{n})$ and $N_{n,l}(Y_{n})$ are conditionally independent given $\mathcal{F}_{n,n}$ , it follows that

[TABLE]

Consequently, for any $k,l$ ,

[TABLE]

2 Poissonian asymptotics

Let $\mathrm{Pois}(\mu)$ denote the Poisson distribution with parameter $\mu$ .

Theorem 2.1.

Let $\mu>0$ . If

[TABLE]

and

[TABLE]

then $V_{n,r}\stackrel{{\scriptstyle d}}{{\to}}\,\mathrm{Pois}(\mu)$ .

**Examples:

$\bullet\$ ** Consider the uniform case, that is, $p_{n,j}=1/m_{n}$ , for $j\in M_{n}=\{1,\ldots,m_{n}\}$ . Then by the above theorem we get

[TABLE]

Illustrative simulations are visualized in Figure 1.

$\bullet\$ Consider the geometric case, $p_{n,j}=p_{n}(1-p_{n})^{j}$ , $j\geq 0$ . Then

[TABLE]

Take $p_{n}=n^{-\tfrac{r+1}{r}}$ (that is $n^{r+1}p_{n}^{r}=1$ ). Thus, by (13), $n^{r+1}\mathbb{E}\,p_{X_{n}}^{r}=\tfrac{p_{n}}{(r+1)p_{n}+o(p_{n})}\to\tfrac{1}{r+1}.$ Moreover, $np_{n}^{*}=np_{n}=n^{-\tfrac{1}{r}}\to 0$ .

Consequently, the above theorem yields $V_{n,r}\stackrel{{\scriptstyle d}}{{\to}}\,\mathrm{Pois}(\mu)$ with $\mu=\tfrac{1}{(r+1)!(r+1)}$ . Illustrative simulations are visualized in Figure 2.

The method of Poissonization and de–Poissonization was used in [Hwang and Janson (2008), Theorem 8.2] to prove Theorem 2.1, for $r=1$ . The proof we present here is entirely different and relies on the following martingale-type convergence result from [Beśka et al. (1982)].

Theorem 2.2.

Let $\{Y_{n,k},\,k=1,\ldots,n;\,n\geq 1\}$ be a double sequence of non-negative random variables, adapted to a row-wise increasing double sequence of $\sigma$ -fields $\{\mathcal{F}_{n,k},\,k=1,\ldots,n;\,n\geq 1\}$ , and let $\eta>0$ . If

[TABLE]

and, for any $\epsilon>0$ ,

[TABLE]

then $\sum_{k=1}^{n}\,Y_{n,k}\stackrel{{\scriptstyle d}}{{\to}}\mathrm{Pois}(\eta)$ .

In the proof of Theorem 2.1 we use the following consequences of (11) and (12).

Lemma 2.3.

Let $s$ be a positive integer. If (11) and (12) hold, then

[TABLE]

and

[TABLE]

Proof.

Since $n^{s}\mathbb{E}\,p_{X_{n}}^{s}\leq(np_{n}^{*})^{s}$ , (17) follows from (12). Also, (18) follows from (11) and (12) since

[TABLE]

∎

We also need the simple estimate shown below, for the tail of a binomial sum.

Lemma 2.4.

Let $m,n$ be positive integers, such that $m\leq n$ , and let $p\in(0,1)$ . Then

[TABLE]

Proof.

The left-hand side of (19) is $\mathbb{P}(B_{n}\geq m)$ , where $B_{n}$ has distribution ${\operatorname{Bin}}(n,p)$ . Arguing by induction on $n$ , we have

[TABLE]

where the last inequality follows from $mn^{m-1}+n^{m}(1-p)\leq(n+1)^{m}$ . ∎

3 Proof of Theorem 2.1

Proof.

We show that for $Y_{n,k}$ defined in (1), conditions (14), (15) with $\eta=\mu$ , and (16) are satisfied. First we note that (16) is trivially satisfied because, for $\epsilon<1$ , $Y_{n,k}=0$ if and only if $I_{\{|Y_{n,k}-1|>\epsilon\}}=1$ .

The rest of the proof is divided into three steps. In Step I we check that (14) is satisfied. Then we prove that (15) holds in quadratic mean, that is,

[TABLE]

To that end we show that $\sum_{k=1}^{n}\,\mathbb{E}\,Y_{n,k}\to\mu$ and ${{\mathbb{V}}\mbox{ar}}\sum_{k=1}^{n}\mathbb{E}\,(Y_{n,k}|\mathcal{F}_{n,k-1})\to 0$ in Step II and Step III, respectively.

Step I: We prove (14) using (8). Clearly, $I_{\{N_{n,k}(m)\geq r\}}\leq I_{\{N_{n,l}(m)\geq r\}}$ , for $k\leq l$ , so

[TABLE]

Note also that, due to (9), (19) and (17),

[TABLE]

Consequently, Markov’s inequality implies $\mathbb{E}\,(Y_{n,n}|\mathcal{F}_{n,n-1})\stackrel{{\scriptstyle\mathbb{P}}}{{\to}}0$ and thus (14) follows.

Step II: To prove that $\lim_{n}\sum_{k=1}^{n}\mathbb{E}\,Y_{n,k}=\mu$ we show that $\limsup_{n}$ and $\liminf_{n}$ are respectively bounded above and below by $\mu$ . From (9), (19) and (11)

[TABLE]

so $\limsup_{n}\,\sum_{k=1}^{n}\,\mathbb{E}\,Y_{n,k}\leq\mu$ .

Additionally, since by (9), $\mathbb{E}\,Y_{n,k}=\mathbb{P}(N_{n,k}(X_{n})\geq r)\geq\mathbb{P}(N_{n,k}(X_{n})=r)$ and $(1-p)^{k}\geq 1-kp,p\in(0,1)$ , we have

[TABLE]

Further, observe that

[TABLE]

Thus, by (11) and (18), the rhs of (20) converges to $\mu$ and so, $\liminf\limits_{n}\sum\limits_{k=1}^{n}\mathbb{E}\,Y_{n,k}\geq\mu$ .

Step III: We prove that $W_{n}:={{\mathbb{V}}\mbox{ar}}\,\sum_{k=1}^{n}\,\mathbb{E}\,(Y_{n,k}|\mathcal{F}_{n,k-1})\to 0$ , relying on the NOD property of $N_{n,k}(m_{1}),N_{n,l}(m_{2})$ , for distinct $m_{1},m_{2}\in M_{n}$ . In what follows we compute and bound some expectations that add up to $W_{n}$ . First note from (10) that

[TABLE]

For $U,V$ square-integrable random variables and $\cal G$ a $\sigma$ -algebra, let the conditional covariance be defined as

[TABLE]

Also, let $I_{k}(m)=I_{\{N_{n,k}(m)\geq r\}}$ (for simplicity) and $k\wedge l=\min\{k,l\}$ . Then, by the iid assumption of $X_{n,1},\ldots,X_{n,n},X_{n},Y_{n}$ , we have

[TABLE]

Furthermore,

[TABLE]

where the last equality follows from $I_{k}(m)\leq I_{l}(m)$ , for $k\leq l$ , because $N_{n,k}(m)\geq r$ implies $N_{n,l}(m)\geq r$ . So, from (21) and (22), we get

[TABLE]

Furthermore, by the NOD property (5),

[TABLE]

Hence, from (21) and (24), we have

[TABLE]

And, finally, from (23) and (25),

[TABLE]

which, after taking expectation, yields

[TABLE]

Also, by (19),

[TABLE]

Last, taking expectation above and adding over $k$ and $l$ , from (27) we obtain

[TABLE]

where convergence to 0 follows from (18). Finally, since $W_{n}\geq 0$ , it follows that $W_{n}\to 0$ . ∎

4 Normal asymptotics for overflow

The following theorem gives conditions under which the overflow is asymptotically normal.

Theorem 4.1.

Assume that $np_{n}^{*}\to\lambda\geq 0$ and that $n^{r+1}\mathbb{E}\,p_{X_{n}}^{r}\to\infty$ . Then

[TABLE]

Examples

$\bullet$ Consider the uniform case, i.e. $p_{n,j}=1/m_{n}$ , $j\in M_{n}=\{1,\ldots,m_{n}\}$ . Then by the above theorem we get

[TABLE]

Note that $m_{n}=\kappa n^{a}$ with $a\in[1,\,1+r^{-1})$ yields normal asymptotics.

$\bullet$ Consider the geometric case, $p_{n,j}=p_{n}(1-p_{n})^{j}$ , $j\geq 0$ , with $p_{n}=n^{-a}$ and $a\in[1,\,1+r^{-1})$ . Then (13) yields

[TABLE]

Moreover,

[TABLE]

Thus, asymptotic normality of $V_{n,r}$ follows from the above theorem. Illustrative simulations are visualized in Figures 3 and 4.

The proof of Theorem 4.1 is split in several steps given in four subsections below. In Subsection 4.1 we decompose $V_{n,r}-\mathbb{E}\,V_{n,r}$ in the sum of martingale differences $\sum_{k=1}^{n}\,d_{n,k}$ , with suitably defined (uniformly bounded) $d_{n,k}$ ’s. In Subsection 4.2 we show that ${{\mathbb{V}}\mbox{ar}}\,V_{n,r}$ is of order $n^{r+1}\mathbb{E}\,p_{X_{n}}^{r}$ . In Subsection 4.3 we show that ${{\mathbb{V}}\mbox{ar}}\,\sum_{k=1}^{n}{{\mathbb{V}}\mbox{ar}}\,(d_{n,k}|\mathcal{F}_{n,k-1})$ is of order $o((n^{r+1}\mathbb{E}\,p_{X_{n}}^{r})^{2})$ . The final part of the proof, which gathers all previous steps, is given in Subsection 4.4.

4.1 Martingale differences decomposition

Lemma 4.2.

The centered size of the overflow can be represented as $V_{n,r}-\mathbb{E}\,V_{n,r}=\sum_{k=1}^{n}d_{n,k}$ , where the $d_{n,k}$ are martingale differences defined by

[TABLE]

Proof.

Clearly, $\mathbb{E}(d_{n,k}|\mathcal{F}_{n,k-1})=0$ . Further, noting that $\mathcal{F}_{n,0}$ is the trivial $\sigma$ -algebra,

[TABLE]

∎

Lemma 4.3.

The martingales differences $d_{n,k}$ of (28) are uniformly bounded and can be represented as

[TABLE]

Proof.

Let $n,r\in{\mathbb{N}},j>k$ and note that $N_{n,j}(X_{n})=N_{n,k}(X_{n})+I_{\{X_{n,k}=X_{n}\}}+N_{n,k+1}^{j}(X_{n})$ . For simplicity let $U_{j}=N_{n,k}(X_{n})+N_{n,k+1}^{j}(X_{n}),V=N_{n,k}(X_{n})$ and $I=I_{\{X_{n,k}=X_{n}\}}$ . Then

[TABLE]

Hence, noting that $\{V\geq r\}=\{N_{n,k}(X_{n})\geq r\}\subseteq\{N_{n,j}(X_{n})\geq r\}=\{U_{j}+I\geq r\}$ , we have

[TABLE]

Consequently, from (7), we can write

[TABLE]

and, similarly,

[TABLE]

Also, note that

[TABLE]

Therefore, for $j>k$ ,

[TABLE]

Thus

[TABLE]

Observe that, for $j>k$ , $\mathbb{E}\,\Big{(}\tfrac{I_{\{X_{n,j}=X_{n}\}}}{p_{X_{n}}}|X_{n},\mathcal{F}_{n,j-1}\Big{)}=1$ . Then

[TABLE]

Note that $\sum_{j=k+1}^{n}I_{\{U_{j}=r-1\}}I_{\{X_{n,j}=X_{n}\}}=I_{\{U_{j}=r-1,U_{j+1}=r,\text{ for some }j\in\{k+1,\ldots,n\}\}}$ , is equal to $I_{\{U_{n+1}\geq r\}}$ on the event $\{N_{n,k}(X_{n})<r\}$ . That is, using the original notation,

[TABLE]

on the event $\{N_{n,k}(X_{n})<r\}$ and so,

[TABLE]

Finally, since

[TABLE]

we conclude that

[TABLE]

For the boundedness of $d_{n,k}$ note that

[TABLE]

∎

4.2 Asymptotic variance

Lemma 4.4.

Assume that $np_{n}^{*}\to\lambda\geq 0$ and that $n^{r+1}\mathbb{E}\,p_{X_{n}}^{r}\to\infty$ . Then

[TABLE]

Proof.

Let $p_{x}=p_{n,x}$ , $q_{x}=1-p_{x}$ and

[TABLE]

Then

[TABLE]

and so

[TABLE]

Also, recalling that $X_{n},Y_{n},X_{n,1},\ldots,X_{n,n}$ are iid,

[TABLE]

where the second equality above follows from the conditional independence of $\tfrac{I_{\{X_{n,k}=X_{n}\}}-p_{X_{n}}}{p_{X_{n}}}T_{n,k}(X_{n})$ and $\tfrac{I_{\{X_{n,k}=Y_{n}\}}-p_{Y_{n}}}{p_{Y_{n}}}T_{n,k}(Y_{n})$ , given $\mathcal{F}_{n,k}$ .

In what follows we compute $\mathbb{E}(d_{n,k}^{2}|\mathcal{F}_{n,k-1})$ by considering the cases $X_{n}=Y_{n}$ and $X_{n}\not=Y_{n}$ . We get

[TABLE]

where the second equality above follows from conditioning inside both expectations above, with respect to $X_{n},Y_{n},\mathcal{F}_{n,k-1}$ . Finally, integrating out $Y_{n}$ in the first expectation, we obtain

[TABLE]

and, consequently,

[TABLE]

For the upper bound of the variance note that $0<T_{n,k}(X_{n})\leq 1$ and thus (34) implies

[TABLE]

Also,

[TABLE]

and so,

[TABLE]

Now, recalling that $N_{n,n+1}(m)$ has distribution ${\operatorname{Bin}}(n,p_{n,m})$ , for $m\in M_{n}$ , and using (19), the rhs of (35) is bounded by $n^{r}p^{r}_{X_{n}}/r!$ . Last, taking expectations, we obtain ${{\mathbb{V}}\mbox{ar}}\,d_{n,k}\leq n^{r}\mathbb{E}\,p^{r}_{X_{n}}/r!$ and, consequently,

[TABLE]

Now, to bound the variance of $d_{n,k}$ from below, we first find an upper bound for the last term (with minus sign) in display (34). To that end note that $T_{n,k}(x),x\in M_{m}$ , defined in (32), can be written as

[TABLE]

where $B_{n-k}(x)$ is ${\operatorname{Bin}}(n-k,p_{x})$ , independent of $X_{n},Y_{n},\mathcal{F}_{n,n}$ , so

[TABLE]

and

[TABLE]

Furthermore, for $y\neq x$ , let $B_{n-k}(y)$ be ${\operatorname{Bin}}(n-k,p_{y})$ , independent of $X_{n},Y_{n},\mathcal{F}_{n,n}$ and independent of $B_{n-k}(x)$ . Then

[TABLE]

can be written as

[TABLE]

and so,

[TABLE]

Then, since, conditionally on $X_{n},Y_{n}$ , $(N_{n,k}(X_{n}),N_{n,k}(Y_{n}))$ is $\mathrm{Mn}_{2}(k-1,p_{X_{n}},p_{Y_{n}})$ and because of the NOD property, we have $\mathbb{E}(J_{n,k}|X_{n},Y_{n})$

[TABLE]

where the second equality follows from the NOD property and the third from (37). Finally, taking expectations and using the independence of $X_{n}$ and $Y_{n}$ , we get

[TABLE]

Replacing the rightmost expectation in display (34) by the bound above we have

[TABLE]

Note that

[TABLE]

Hence, since $np_{n}^{*}\to\lambda$ ,

[TABLE]

Finally note that $T_{n,k}(x)$ , as defined in (32), can be written in the form

[TABLE]

where $P_{j}(x)=\binom{n-k}{j}\,p_{x}^{j}\,q_{x}^{n-k-j}$ and $I_{j}(x)=I_{\{N_{n,k}(x)\geq r-j\}}$ . Therefore,

[TABLE]

Since $I_{j_{1}}(x)\leq I_{j_{2}}(x)$ it follows that the double sum above is non-negative and so,

[TABLE]

Consequently,

[TABLE]

and finally, since $np_{n}^{*}\to\lambda$ ,

[TABLE]

∎

4.3 Variance of the sum of conditional variances

Lemma 4.5.

Under the hypotheses of Lemma 4.4

[TABLE]

Proof.

We first rewrite (33) as

[TABLE]

where

[TABLE]

Consequently, letting $W_{n}^{\alpha}={{\mathbb{V}}\mbox{ar}}\sum_{k=1}^{n}\,\alpha_{n,k}$ , $W_{n}^{\beta}={{\mathbb{V}}\mbox{ar}}\sum_{k=1}^{n}\,\beta_{n,k}$ and noting that ${{\mathbb{V}}\mbox{ar}}\,(X+Y)\leq 2({{\mathbb{V}}\mbox{ar}}\,X+{{\mathbb{V}}\mbox{ar}}\,Y)$ , we have

[TABLE]

Then

[TABLE]

and the analogous formula holds for $W_{n}^{\beta}$ . In what follows we express the variances and covariances of $\alpha_{n,k},\beta_{n,k}$ in terms of $A_{n,k}(X_{n}),B_{n,k}(X_{n},Y_{n})$ . For simplicity, let $Z_{n}=(X_{n},Y_{n}),Z^{\prime}_{n}=(X^{\prime}_{n},Y^{\prime}_{n})$ , then

[TABLE]

where $X_{n}^{\prime}$ and $Y_{n}^{\prime}$ are such that $X_{n},X_{n}^{\prime},Y_{n},Y_{n}^{\prime},X_{n,1},\ldots,X_{n,n}$ are iid for any $n\geq 1$ . We only check the first formula; the others are obtained similarly.

[TABLE]

and the formula for ${{\mathbb{V}}\mbox{ar}}\,\alpha_{n,k}$ follows. We now compute bounds for the covariances in (42). Since $A_{n,k}(x)$ and $B_{n,k}(x,y)$ are bounded above by $T_{n,k}(x)\leq 1$ reasoning as in the paragraph preceding (36), we have,

[TABLE]

and

[TABLE]

Next, we handle ${{\mathbb{C}}\mbox{ov}}(A_{n,k}(X_{n}),A_{n,l}(X_{n}^{\prime}))$ , which requires somewhat more effort than the previous covariances because the crude bounds do not yield the right order in $n$ . Since $A_{n,k}(x)=(1-p_{x})T^{2}_{n,k}(x)$ ,

[TABLE]

because each of the remaining three covariances is bounded by an expression of the form $\mathbb{E}p_{X_{n}}T_{n,k}(X_{n})\leq cn^{r}\mathbb{E}p_{X_{n}}^{r+1}$ . To bound the covariance between $T_{n,k}^{2}(X_{n})$ and $T^{2}_{n,l}(X^{\prime}_{n})$ we write

[TABLE]

and note that the first expectation in (46) is bounded by

[TABLE]

where $c$ is a positive constant. For the second expectation in (46) we have the following expression, written in terms of (conditionally independent) binomial random variables $B_{1},B_{2},B_{1}^{\prime},B_{2}^{\prime}$ .

[TABLE]

Conditionally on $(X_{n},X_{n}^{\prime})$ , $B_{1},B_{2},B_{1}^{\prime},B_{2}^{\prime}$ are independent, with $B_{1},B_{2}$ distributed ${\operatorname{Bin}}(n-k,p_{X_{n}})$ and $B^{\prime}_{1},B^{\prime}_{2}$ distributed ${\operatorname{Bin}}(n-k,p_{X^{\prime}_{n}})$ . Further, $B_{1},B_{2},B_{1}^{\prime},B_{2}^{\prime}$ are independent of $\mathcal{F}_{n,k},\mathcal{F}_{n,l}$ , conditionally on $(X_{n},X_{n}^{\prime})$ .

Note that (48) can be rewritten as

[TABLE]

where $B_{12}=\min\{B_{1},B_{2}\}$ and $B^{\prime}_{12}=\min\{B^{\prime}_{1},B^{\prime}_{2}\}$ . Note also that, for $x\neq y$ , $N_{n,k}(x)$ and $N_{n,l}(y)$ are NOD; see (5). Thus, conditioning on the values of the binomials, using the NOD property; then integrating over the $B$ ’s and using independence of $X_{n}$ and $X^{\prime}_{n}$ , we have the following upper bound for (49)

[TABLE]

which, after ignoring the indicator and noting that the conditional probabilities (on $X_{n}$ and $X_{n}^{\prime}$ ) are independent random variables, can be finally bounded by

[TABLE]

Therefore, from (45), (46), (47) and (50), we have

[TABLE]

It remains to bound the covariances ${{\mathbb{C}}\mbox{ov}}\,(B_{n,k}(Z_{n}),B_{n,l}(Z^{\prime}_{n}))$ . To that end we consider first, the expected value of the product.

[TABLE]

where $D$ is the event that $X_{n},Y_{n},X^{\prime}_{n},Y^{\prime}_{n}$ are all distinct. Then,

[TABLE]

Note that, as in (48), the first term on the rhs of (52) can be written as follows

[TABLE]

Conditionally on $(Z_{n},Z^{\prime}_{n})$ , $B_{1},B_{2},B_{1}^{\prime},B_{2}^{\prime}$ are independent, where $B_{1}$ is ${\operatorname{Bin}}(n-k,p_{X_{n}})$ , $B_{2}$ is ${\operatorname{Bin}}(n-k,p_{Y_{n}})$ , $B^{\prime}_{1}$ is ${\operatorname{Bin}}(n-k,p_{X^{\prime}_{n}})$ and $B^{\prime}_{2}$ is ${\operatorname{Bin}}(n-k,p_{Y^{\prime}_{n}})$ . Also, $B_{1},B_{2},B_{1}^{\prime},B_{2}^{\prime}$ are independent of $\mathcal{F}_{n,k},\mathcal{F}_{n,l}$ , conditionally on $(Z_{n},Z_{n}^{\prime})$ . Now, using the NOD property (4) and the independence of $X_{n},Y_{n},X^{\prime}_{n}$ , $Y^{\prime}_{n}$ , the expression in (54) is bounded above by

[TABLE]

Therefore, from (52), (53) and (55),

[TABLE]

We complete the proof of (38) by collecting the partial results above to obtain bounds for $W_{n}^{\alpha}$ and $W_{n}^{\beta}$ , using formula (41). From (43) and (44) we have

[TABLE]

From (45) and (51)

[TABLE]

Last, from (56)

[TABLE]

The conclusion follows from (40), (41) and the bounds for the sums of variances and covariances above. ∎

4.4 Final touch - the martingale CLT

We show the asymptotic normality by applying the martingale central limit theorem (see e. g. [Helland (1982), Theorem 2.5] to the martingale differences $(d_{n,k})$ . Since $d_{n,k}$ ’s are uniformly bounded the conditional Lindeberg condition ([Helland (1982), condition (2.5)]) follows from the fact that the variance of the sum grows to infinity as $n\to\infty$ . The remaining condition to be checked ([Helland (1982), condition (2.7)]) is that

[TABLE]

as $n\to\infty$ or, equivalently, that

[TABLE]

But this follows immediately from Lemma 4.4, Lemma 4.5 and Chebyshev’s inequality.

5 Asymptotics for number of full containers with and without overflow

Let $L_{n,r}$ denote the number of full containers and $M_{n,r}$ denote number of full containers without overflow. The main idea is to represent $L_{n,r}$ and $M_{n,r}$ in terms of the size of the overflow $V_{n,r}$ .

Recall that $N_{n,n+1}(m)$ is the total number of balls in the sample for which the $m$ th box was selected. Thus

[TABLE]

We note that

[TABLE]

That is,

[TABLE]

and

[TABLE]

Note that in the case $r=1$ we have $V_{n,0}=n$ and thus $L_{n,1}$ , which is a number of non-empty boxes, is

[TABLE]

and $M_{n,1}$ , which is number of singleton boxes, is

[TABLE]

These representations of $M_{n,r}$ and $L_{n,r}$ in terms of $V_{n,r-1}$ , $V_{n,r}$ and $V_{n,r+1}$ allow to read Poissonian asymptotics of these two sequences from Theorem 2.1. For $M_{n,r}$ the forthcoming statement was proved in [Kolchin et al. (1978), Theorem III.3.1].

Theorem 5.1.

Assume that $np_{n}^{*}\to 0$ .

If $r>1$ and $n^{r}\,\mathbb{E}\,p_{X_{n}}^{r-1}\to r!\mu$ then

[TABLE] 2. 2.

If $r=1$ and $n^{2}\,\mathbb{E}\,p_{X_{n}}\to\mu$ then

[TABLE]

Proof.

The case $r>1$ : Due to representations (57) and (58) to prove both results it suffices to show that $\mathbb{E}\,V_{n,s}\to 0$ for any fixed $s\geq r$ . But following the argument from the beginning of Step II of the proof of Theorem 2.1 we see that

[TABLE]

where the convergence to zero in the last step follows from Lemma 2.3.

The case $r=1$ : The first part follows from Theorem 2.1 since (59) implies $n-L_{n,1}=V_{n,1}$ . The second follows also from Theorem 2.1 since (60) gives

[TABLE]

and, similarly as in the case $r>1$ , we have $\mathbb{E}\,V_{n,2}\to 0$ . ∎

Note that under assumptions of Th. 5.1

•

in case 1: $L_{n,r}-M_{n,r}\stackrel{{\scriptstyle\mathbb{P}}}{{\to}}0$ ,

•

in case 2: $\tfrac{L_{n,1}-M_{n,1}}{n}\stackrel{{\scriptstyle\mathbb{P}}}{{\to}}1.$

Representations (57) and (58) are also useful for getting Gaussian asymptotics of $L_{n,r}$ and $M_{n,r}$ from Theorem 4.1 in the case $\lambda=0$ .

Theorem 5.2.

Assume that $np_{n}^{*}\to 0$ and $r\geq 1$ .

If $n^{r+1}\mathbb{E}\,p_{X_{n}}^{r}\to\infty$ then

[TABLE] 2. 2.

If $n^{r+2}\mathbb{E}\,p_{X_{n}}^{r+1}\to\infty$ then

[TABLE]

Proof.

By representation (57) we can write

[TABLE]

Since $n^{r+1}\mathbb{E}\,p_{X_{n}}^{r}\leq np_{n}^{*}\,n^{r}\mathbb{E}\,p_{X_{n}}^{r-1}$ it follows that $n^{r}\mathbb{E}\,p_{X_{n}}^{r-1}\to\infty$ . Therefore by Lemma 4.4 we have

[TABLE]

and thus also

[TABLE]

Consequently, $\tfrac{V_{n,r}-\mathbb{E}\,V_{n,r}}{\sqrt{{{\mathbb{V}}\mbox{ar}}\,L_{n,r}}}\stackrel{{\scriptstyle L^{2}}}{{\to}}0$ . Thus the first result is a consequence of Theorem 4.1 since, in view of the representation (57),

[TABLE]

For the second case, by representation (58) we can write

[TABLE]

Similarly as in the previous case we conclude that $n^{s}\mathbb{E}\,p_{X_{n}}^{s-1}\to\infty$ for $s=r,r+1$ . Therefore, by the same argument as above it follows that each of the summands at the right hand side above except the first one converges to 0 as $n\to\infty$ . Consequently, $\tfrac{V_{n,s}-\mathbb{E}\,V_{n,s}}{\sqrt{{{\mathbb{V}}\mbox{ar}}\,M_{n,r}}}\stackrel{{\scriptstyle L^{2}}}{{\to}}0$ , $s=r,r+1$ . Thus the second result is a consequence of Theorem 4.1 since, in view of (58),

[TABLE]

∎

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Arratia et al. (2016)] Arratia, R., Garibaldi, S., Kilian, J. Asymptotic distribution for the birthday problem with multiple coincidences, via an embedding of the collision process. Random Structures & Algorithms 48 (2016), 480–502.
2[Beśka et al. (1982)] Beśka, M., Kłopotowski, A., Słomiński, L. Limit theorems for random sums of dependent d 𝑑 d -dimensional random vectors. Z. Wahrschein. verw. Geb. 61 (1982), 43–57.
3[Bobecka et al. (2013)] Bobecka, K., Hitczenko, P., López-Blázquez, F., Rempała, G., Wesołowski, J. Asymptotic normality through factorial cumulants and partition identities. Combin. Probab. Comput. 22(2) (2013), 213–240.
4[Chao and Chiu (2016)] Chao, A., Chiu, C.-H. Species richness: estimation and comparison. Wiley Stats Ref: Statistics Reference Online, 1–26.
5[Dupuis et al. (2004)] Dupuis, P., Nuzman, C., Whiting, P. Large deviation asymptotics for occupancy problems. Ann. Probab. 32 (2004), 2765–2818.
6[Gnedin et al. (2007)] Gnedin, A., Hansen, B., Pitman, J. , Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws. Probab. Surv. 4 (2007), 146–171.
7[Helland (1982)] Helland, I. S. , Central limit theorems for martingales with discrete or continuous time. Scand. J. Statist. 9 (1982), 79–94.
8[Hwang and Janson (2008)] Hwang, H.K., Janson, S. Local limit theorems for finite and infinite urn models. Ann. Probab. 36(3) (2008), 992–1022.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Asymptotics of the overflow in urn models111This material is based upon work supported by and while serving at the National Science Foundation. Any opinion, findings, and conclusions or recommendations expressed in this material are

Abstract

1 Introduction

1.1 Multinomial distribution and negative association

1.2 Auxiliary random variables

2 Poissonian asymptotics

Theorem 2.1**.**

Theorem 2.2**.**

Lemma 2.3**.**

Proof.

Lemma 2.4**.**

Proof.

3 Proof of Theorem 2.1

Proof.

4 Normal asymptotics for overflow

Theorem 4.1**.**

4.1 Martingale differences decomposition

Lemma 4.2**.**

Proof.

Lemma 4.3**.**

Proof.

4.2 Asymptotic variance

Lemma 4.4**.**

Proof.

4.3 Variance of the sum of conditional variances

Lemma 4.5**.**

Proof.

4.4 Final touch - the martingale CLT

5 Asymptotics for number of full containers with and without overflow

Theorem 5.1**.**

Proof.

Theorem 5.2**.**

Proof.

Theorem 2.1.

Theorem 2.2.

Lemma 2.3.

Lemma 2.4.

Theorem 4.1.

Lemma 4.2.

Lemma 4.3.

Lemma 4.4.

Lemma 4.5.

Theorem 5.1.

Theorem 5.2.