Rates of convergence for extremal spacings in Kakutani's random interval-splitting process

Fraser Daly; Andrew Wade

arXiv:2508.20749·math.PR·August 29, 2025

Rates of convergence for extremal spacings in Kakutani's random interval-splitting process

Fraser Daly, Andrew Wade

PDF

Open Access

TL;DR

This paper analyzes the rate of convergence in the distribution of extremal spacings in Kakutani's interval-splitting process, providing quantitative bounds and connecting to branching processes.

Contribution

It offers the first quantitative bounds for the convergence rates of the maximum and minimum sub-interval lengths in Kakutani's process, including Berry-Esseen bounds and exponential convergence results.

Findings

01

Central limit theorem for maximum sub-interval length with quantitative bounds

02

Exponential distribution convergence for minimum sub-interval length

03

Quantitative error bounds using Hermite-Edgeworth expansion

Abstract

Kakutani's random interval-splitting process iteratively divides, via a uniformly random splitting point, the largest sub-interval in a partition of the unit interval. The length of the longest sub-interval after $n$ steps, suitably centred and scaled, is known to satisfy a central limit theorem as $n \to \infty$ . We provide a quantitative (Berry-Esseen) upper bound for the finite- $n$ approximation in the central limit theorem, with conjecturally optimal rates in $n$ . We also prove convergence to an exponential distribution for the length of the smallest sub-interval, with quantitative bounds. The Kakutani process can be embedded in certain branching and fragmentation processes, and we translate our results into that context also. Our proof uses conditioning on an intermediate time, a conditional independence structure for statistics involving small sub-intervals, an Hermite-Edgeworth…

Equations470

n \to \infty lim n M_{n} = 2, a.s.

n \to \infty lim n M_{n} = 2, a.s.

n \to \infty lim P (\frac{n ^{3}}{σ ^{2}} (M_{n} - \frac{2}{n}) \leq x) = Φ (x), for every x \in R .

n \to \infty lim P (\frac{n ^{3}}{σ ^{2}} (M_{n} - \frac{2}{n}) \leq x) = Φ (x), for every x \in R .

x \in R sup P (\frac{n ^{3}}{σ ^{2}} (M_{n} - \frac{2}{n}) \leq x) - Φ (x) \leq \frac{C}{n}, for every n \in N,

x \in R sup P (\frac{n ^{3}}{σ ^{2}} (M_{n} - \frac{2}{n}) \leq x) - Φ (x) \leq \frac{C}{n}, for every n \in N,

x \in R_{+} sup P (\frac{n ^{2} m _{n}}{2} > x) - e^{- x} \leq \frac{C ( 1 + lo g n )}{n}, for every n \in N .

x \in R_{+} sup P (\frac{n ^{2} m _{n}}{2} > x) - e^{- x} \leq \frac{C ( 1 + lo g n )}{n}, for every n \in N .

x \in R sup P (\frac{2}{σ ^{2}} e^{- t /2} (T_{t} - 2 e^{t}) \leq x) - Φ (x) \leq C e^{- t /2}, for every t \in R_{+} .

x \in R sup P (\frac{2}{σ ^{2}} e^{- t /2} (T_{t} - 2 e^{t}) \leq x) - Φ (x) \leq C e^{- t /2}, for every t \in R_{+} .

x \in R sup P (\frac{4 n}{σ ^{2}} (ℓ_{n} - lo g (n /2)) \leq x) - Φ (x) \leq \frac{C}{n}, for every n \in N .

x \in R sup P (\frac{4 n}{σ ^{2}} (ℓ_{n} - lo g (n /2)) \leq x) - Φ (x) \leq \frac{C}{n}, for every n \in N .

\sup_{x\in{\mathbb{R}}}\left|\operatorname{\mathbb{P}}\left(r_{n}-\log(n^{2}/2)\leq x\right)-\exp\bigl{(}-{{\mathrm{e}}^{-x}}\bigr{)}\right|\leq\frac{C(1+\log n)}{\sqrt{n}},\text{ for every }n\in{\mathbb{N}}.

\sup_{x\in{\mathbb{R}}}\left|\operatorname{\mathbb{P}}\left(r_{n}-\log(n^{2}/2)\leq x\right)-\exp\bigl{(}-{{\mathrm{e}}^{-x}}\bigr{)}\right|\leq\frac{C(1+\log n)}{\sqrt{n}},\text{ for every }n\in{\mathbb{N}}.

X_{n, 0} := 0 < X_{n, 1} < \dots < X_{n, n} < 1 =: X_{n, n + 1} .

X_{n, 0} := 0 < X_{n, 1} < \dots < X_{n, n} < 1 =: X_{n, n + 1} .

L_{n, i} := X_{n, i} - X_{n, i - 1}, for 1 \leq i \leq n + 1.

L_{n, i} := X_{n, i} - X_{n, i - 1}, for 1 \leq i \leq n + 1.

M_{n} := 1 \leq i \leq n + 1 max L_{n, i}, and m_{n} := 1 \leq i \leq n + 1 min L_{n, i} .

M_{n} := 1 \leq i \leq n + 1 max L_{n, i}, and m_{n} := 1 \leq i \leq n + 1 min L_{n, i} .

L_{n, 1}, \dots, L_{n, n + 1} all distinct and non-zero,

L_{n, 1}, \dots, L_{n, n + 1} all distinct and non-zero,

X_{n + 1, i} = ⎩ ⎨ ⎧ X_{n, i} X_{n, ℓ_{n}} + U_{n + 1} M_{n} X_{n, i - 1} for 0 \leq i \leq ℓ_{n}, for i = ℓ_{n} + 1, for ℓ_{n} + 2 \leq i \leq n + 2.

X_{n + 1, i} = ⎩ ⎨ ⎧ X_{n, i} X_{n, ℓ_{n}} + U_{n + 1} M_{n} X_{n, i - 1} for 0 \leq i \leq ℓ_{n}, for i = ℓ_{n} + 1, for ℓ_{n} + 2 \leq i \leq n + 2.

n \to \infty lim x \in [0, 1] sup ∣ E_{n} (x) - x ∣ = 0, a.s.

n \to \infty lim x \in [0, 1] sup ∣ E_{n} (x) - x ∣ = 0, a.s.

G_{n} (y) := \frac{1}{n + 1} i = 1 \sum n + 1 \mathbbm 1 {(n + 1) L_{n, i} \leq y},

G_{n} (y) := \frac{1}{n + 1} i = 1 \sum n + 1 \mathbbm 1 {(n + 1) L_{n, i} \leq y},

n \to \infty lim y \in [0, 2] sup G_{n} (y) - \frac{y}{2} = 0, a.s.,

n \to \infty lim y \in [0, 2] sup G_{n} (y) - \frac{y}{2} = 0, a.s.,

N_{t} := in f {n \in Z_{+} : M_{n} \leq t};

N_{t} := in f {n \in Z_{+} : M_{n} \leq t};

P (M_{n} \leq t) = P (N_{t} \leq n), for every n \in N and t \in (0, \infty),

P (M_{n} \leq t) = P (N_{t} \leq n), for every n \in N and t \in (0, \infty),

N_{t} = N_{t / U_{1}}^{(1)} + N_{t / (1 - U_{1})}^{(2)} + 1, a.s., for 0 < t < 1.

N_{t} = N_{t / U_{1}}^{(1)} + N_{t / (1 - U_{1})}^{(2)} + 1, a.s., for 0 < t < 1.

N_{t} = d N_{t / U}^{(1)} + N_{t / (1 - U)}^{(2)} + 1, for 0 < t < 1,

N_{t} = d N_{t / U}^{(1)} + N_{t / (1 - U)}^{(2)} + 1, for 0 < t < 1,

x \in R sup P (\frac{2 t}{σ ^{2}} (N_{t} - \frac{2}{t}) \leq x) - Φ (x) \leq C t,

x \in R sup P (\frac{2 t}{σ ^{2}} (N_{t} - \frac{2}{t}) \leq x) - Φ (x) \leq C t,

x \in R sup P (n^{2} m_{n}^{D} \geq x) - e^{- x} \leq \frac{C}{n}, for every n \in N,

x \in R sup P (n^{2} m_{n}^{D} \geq x) - e^{- x} \leq \frac{C}{n}, for every n \in N,

μ (t) = (\frac{2}{t} - 1) \mathbbm 1 {0 < t < 1} .

μ (t) = (\frac{2}{t} - 1) \mathbbm 1 {0 < t < 1} .

v (t) = ⎩ ⎨ ⎧ \frac{s _{0}}{t} 2 + \frac{2 - 8 lo g t}{t} - \frac{4}{t ^{2}} if 0 < t \leq 1/2, if 1/2 < t < 1.

v (t) = ⎩ ⎨ ⎧ \frac{s _{0}}{t} 2 + \frac{2 - 8 lo g t}{t} - \frac{4}{t ^{2}} if 0 < t \leq 1/2, if 1/2 < t < 1.

t \geq t_{0} sup E (N_{t}^{k}) = E (N_{t_{0}}^{k}) < \infty.

t \geq t_{0} sup E (N_{t}^{k}) = E (N_{t_{0}}^{k}) < \infty.

E (N_{t}^{k}) \leq C_{k} t^{- k}, for 0 < t \leq \frac{1}{k} .

E (N_{t}^{k}) \leq C_{k} t^{- k}, for 0 < t \leq \frac{1}{k} .

P (N_{t} \geq n) \leq \frac{E ( N _{t}^{k + 1} )}{n ^{k + 1}} \leq \frac{C _{k + 1}}{n ^{k + 1}} \cdot t^{- k - 1} .

P (N_{t} \geq n) \leq \frac{E ( N _{t}^{k + 1} )}{n ^{k + 1}} \leq \frac{C _{k + 1}}{n ^{k + 1}} \cdot t^{- k - 1} .

E (M_{n}^{k})

E (M_{n}^{k})

\leq k \int_{0}^{(n + 1)^{- 1}} t^{k - 1} d t + k \int_{(n + 1)^{- 1}}^{1} t^{k - 1} P (N_{t} \geq n) d t .

E (M_{n}^{k})

E (M_{n}^{k})

n \in N sup (n E ∣ n M_{n} - 2∣) \leq B_{0} .

n \in N sup (n E ∣ n M_{n} - 2∣) \leq B_{0} .

- (n + 2) M_{n} \leq - 1 - M_{n} \leq n M_{n} - 2 \leq n M_{n}, for all n \in N .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and statistical mechanics

Full text

Rates of convergence for extremal spacings in Kakutani’s

random interval-splitting process

Fraser Daly111Department of Actuarial Mathematics and Statistics, and the Maxwell Institute for Mathematical Sciences, Heriot–Watt University, Edinburgh EH14 4AS; [email protected]

Andrew Wade222Department of Mathematical Sciences, Durham University, Durham DH1 3LE; [email protected]

Abstract

Kakutani’s random interval-splitting process iteratively divides, via a uniformly random splitting point, the largest sub-interval in a partition of the unit interval. The length of the longest sub-interval after $n$ steps, suitably centred and scaled, is known to satisfy a central limit theorem as $n\to\infty$ . We provide a quantitative (Berry–Esseen) upper bound for the finite- $n$ approximation in the central limit theorem, with conjecturally optimal rates in $n$ . We also prove convergence to an exponential distribution for the length of the smallest sub-interval, with quantitative bounds. The Kakutani process can be embedded in certain branching and fragmentation processes, and we translate our results into that context also. Our proof uses conditioning on an intermediate time, a conditional independence structure for statistics involving small sub-intervals, an Hermite–Edgeworth expansion, and moments estimates with quantitative error bounds.

Key words: Interval division, Kakutani process, maximum/minimum gap, central limit theorem, Berry–Esseen theorem, Crump–Mode–Jagers process, branching random walk.

AMS Subject Classification: 60F05 (Primary) 60G18, 60J80 (Secondary).

1 Kakutani’s interval-splitting process

1.1 Main results

The subject of this paper is the following random process which takes values on partitions of the unit interval and evolves by successive uniform binary splitting of the maximal interval, attributed to Kakutani. Start with the unit interval $[0,1]$ ; at each subsequent step, choose the largest of the current collection of intervals, and split it into two random subintervals by inserting a uniform random splitting point, independently of previous steps. Ties can be broken arbitrarily, but, with probability $1$ , they do not occur.

We give some slightly more formal definitions, and some historical context and motivation, in Section 1.3 below. Our main result, Theorem 1.3, concerns the $n\to\infty$ asymptotics of the random variable $M_{n}$ , the length of the largest among the $n+1$ subintervals in the partition resulting from $n$ steps of the process described above.

Since the sum of all the subintervals in the partition is always $1$ , it is clear that $M_{n}>\frac{1}{n+1}$ , a.s., for every $n\in{\mathbb{N}}:=\{1,2,3,\ldots\}$ . The strong law of large numbers

[TABLE]

is due to Lootgieter (Corollary 1.2 of [20, p. 397]) and Pyke (Lemma 1 in [26, p. 159]). Over 20 years later, Pyke and van Zwet [27, p. 414] obtained the following central limit theorem (CLT). Let $\Phi$ denote the cumulative distribution function of the standard normal distribution, i.e., $\Phi(x):=(2\pi)^{-1/2}\int_{-\infty}^{x}{\mathrm{e}}^{-z^{2}/2}{\mathrm{d}}z$ for $x\in{\mathbb{R}}$ .

Proposition 1.1 (Pyke & van Zwet, 2004).

Let $\sigma^{2}:=16\log 2-10\approx 1.09035$ . Then

[TABLE]

*Remark 1.2**.*

Via an embedding we describe in Section 1.2, and an inversion we describe in Section 1.4, Proposition 1.1 also follows from earlier work of Sibuya & Itoh [31]: see Remark 1.11.

Our main result gives Berry–Esseen bounds on the rate of convergence in the CLT of Proposition 1.1.

Theorem 1.3.

There exists a constant $C\in{\mathbb{R}}_{+}:=[0,\infty)$ such that,

[TABLE]

where $\sigma^{2}$ is as defined in Proposition 1.1.

*Remark 1.4**.*

We conjecture that the $O(n^{-1/2})$ error bound in Theorem 1.3 is optimal, as is generic in the classical Berry–Esseen theorem (see e.g. [10, §7.6]). While we do not have a proof of a lower bound of matching order to the upper bound in (1.2), Figure 1 presents some simulation evidence that appears to support this conjecture.

A second limit theorem that we deduce from some of the structural results that we develop in this paper concerns the length $m_{n}$ of the smallest gap after $n$ steps of Kakutani’s interval-splitting process. In particular, the next result shows that $n^{2}m_{n}/2$ converges in distribution to a unit-mean exponential distribution as $n\to\infty$ .

Theorem 1.5.

There exists a constant $C\in{\mathbb{R}}_{+}$ such that,

[TABLE]

*Remark 1.6**.*

We suspect that the $n^{-1/2}$ polynomial rate in (1.3) cannot be improved (see Figure 1), in contrast to the $n^{-1}$ rate in the analogous limit for the Dirichlet process at (1.19) below, but we are unsure if the $\log n$ factor is sharp. Establishing the optimal rate of convergence in Theorem 1.5, including removing any possibly superfluous logarithmic factors and explaining apparent non-linearities in the right-hand plot of Figure 1, is left as a topic for future work.

We give an overview of our proof strategy, and the organization of the paper, in Section 1.5 below. First, in Section 1.2 we describe how our result can be interpreted in the context of a branching process, and its relationship to some adjacent models.

1.2 Related branching, fragmentation, and parking processes

The recursive interval division of the Kakutani process is reminiscent of branching and fragmentation structures. Indeed, the Kakutani process admits several embeddings into (perhaps more familiar) classes of probability models, and is adjacent to several others. In this section we indicate some of these links, where, typically, optimal Berry–Esseen results do not appear to be known, in part to draw attention to scope for possible extensions of the present work.

We believe that Kingman [16] was the first to explicitly use an embedding into a branching process to put the strong law (1.1) for the Kakutani process into a more general framework; essentially the same construction appears in Sibuya & Itoh [31] but without the explicit link to the Kakutani model. Subsequent work has also emphasized correspondences to fragmentation processes and branching random walks, where considerable technology has been developed. The Kakutani process translates to quite special versions of these general structures, and, as far as we are aware, our main results are not subsumed within the general literature.

Kingman [16] observed that applying the bijective map $x\mapsto\log(1/x)$ to the collection of subinterval lengths in the Kakutani process gives a process on $(0,\infty)$ in which interval splitting translates to additive displacement; there are several ways to exploit this. The first is Kingman’s original embedding.

Total population in a binary branching process.

In the Kakutani process, since $M_{n}\to 0$ , a.s., every subinterval present in the process at some time $m$ , say, will be split at some time $n\geq m$ , and hence have length equal to $M_{n}$ . In other words, ${\mathcal{L}}:=\{M_{0},M_{1},M_{2},\ldots\}$ is the collection of all (a.s. distinct) subinterval lengths observed during the Kakutani process, every $L_{n,i}$ appears in ${\mathcal{L}}$ , and, typically, $L_{n,i}=L_{m,j}$ for many pairs $(n,i)$ , $(m,j)$ . The length $M_{n}$ is in position $n+1$ in ${\mathcal{L}}$ viewed as an ordered list. That is, if $N_{u}:=\sum_{\ell\in{\mathcal{L}}}{\mathbbm{1}\mkern-1.5mu}{\{\ell>u\}}$ , we have $\{M_{n}>u\}$ if and only if $\{N_{u}>n\}$ .

Applying the map $x\mapsto\log(1/x)$ gives the branching process interpretation as a population of individuals with birth times indexed by ${\mathcal{T}}:=-\log{\mathcal{L}}\subset{\mathbb{R}}_{+}$ : the ancestor $0=-\log M_{0}\in{\mathcal{T}}$ is born at time [math], and every individual $t\in{\mathcal{T}}$ gives birth to two offspring at two (different, correlated) times distributed as $t-\log U$ and $t-\log(1-U)$ , $U\sim{\mathrm{Unif}[0,1]}$ . (Throughout the paper, the notation ${\mathrm{Unif}[a,b]}$ stands for the continuous uniform distribution on interval $[a,b]$ , for real $a<b$ .) The quantity $N_{{\mathrm{e}}^{-t}}=T_{t}:=\#({\mathcal{T}}\cap[0,t))$ is the total population number observed before time $t\in{\mathbb{R}}_{+}$ .

This construction, relating $M_{n}$ in the Kakutani process to total population size in a Crump–Mode–Jagers branching process with binary offspring and correlated birth times, is due to Kingman [16]. The following is a translation of Theorem 1.3 (in fact, the translation is direct from Theorem 1.10 via the inversion described in Section 1.4 below).

Corollary 1.7.

There exists $C\in{\mathbb{R}}_{+}$ such that, with $\sigma^{2}$ as defined in Proposition 1.1,

[TABLE]

A different perspective on Kingman’s construction, going back to Sibuya & Itoh [31], interprets $N_{u}$ as the height of a random fragmentation tree, a subject studied more generally in [13] (see also the references therein). That description is very close to the one we present in Section 1.4, and so we defer further discussion to Remark 1.11 below.

Recently, for a general class of Crump–Mode–Jagers processes, deep results [11, 12] have been obtained generalizing the convergence in distribution part of Corollary 1.7. Theorem 3.2 of [11] is specific to binary branching but does not apply directly to our case, as that work assumes independent birth times for siblings, while [12] does admit the correlations present here. Of course, the non-quantitative part of Corollary 1.7 is already known via a translation of Proposition 1.1, but the results of [11, 12] indicate that there is a much more general setting than the Kakutani process to which one might seek to extend the Berry–Esseen results of Theorem 1.3. We are not aware of any existing Berry–Esseen results in the Crump–Mode–Jagers context, and our approach in the present paper seems to make essential use of features of the Kakutani model.

Extremum-driven branching random walk.

Applying the bijective map $x\mapsto\log(1/x)$ to the collection of subinterval lengths, but retaining the original time indexing, translates the Kakutani process to the following special type of discrete-time branching random walk on ${\mathbb{R}}_{+}$ . Start with a single particle at the origin. At each step, the leftmost particle is removed, and replaced with two independent offspring, displaced by independent unit-mean exponential random variables relative to the parent. Let $\ell_{n},r_{n}$ be the location of the leftmost, respectively, rightmost particles after $n$ branching events. Then $\ell_{n}=\log(1/M_{n})$ and $r_{n}=\log(1/m_{n})$ in terms of $M_{n},m_{n}$ in the Kakutani process described in Section 1.1. The following is then a consequence of Theorems 1.3 and 1.5.

Corollary 1.8.

There exists $C\in{\mathbb{R}}_{+}$ such that, with $\sigma^{2}$ as defined in Proposition 1.1,

[TABLE]

Moreover,

[TABLE]

Other models with branching and rank-dependent dynamics, but a fixed population size, have been studied, motivated, for example, by selective pressures on individuals in an evolving population [7, 5] or on species in an evolving ecosystem [3].

A zero-length slack parking model.

Fix parameters $x\in(0,\infty)$ and $\ell\in[0,1]$ . Take a sequence of independent ${\mathrm{Unif}[0,x]}$ random variables, representing the left endpoints of length- $\ell$ cars that successively arrive at the kerb $[0,x]$ . Each car is allowed to park if and only if (i) its extent is contained in the subset of $[0,x]$ not already occupied by parked cars, and (ii) the gap in which it parks has length exceeding 1. The process becomes jammed once no gap between neighbouring cars has length exceeding $1$ , at which point no more cars can park. Let $P_{\ell,x}$ denote the (random) number of cars parked at jamming.

The $\ell=1$ version of this model was first studied by Rényi [28], who evaluated the asymptotic parking fraction $\lim_{x\to\infty}x^{-1}\operatorname{\mathbb{E}}P_{1,x}$ . The generalization $0\leq\ell<1$ , which introduces some slack around each car, is included in [25, 31], with the asymptotic parking fraction being obtained in [17]. The parking model is part of a wide class of models motivated by various irreversible physical and chemical processes, known more broadly as random sequential adsorption.

When $\ell=0$ (zero-length cars) the random variable $P_{0,x}$ has the interpretation as the number of splitting events in the following procedure: starting with the interval $[0,x]$ , uniformly split any interval of length exceeding $1$ until there is no such interval left. It is then not hard to see that $P_{0,x}$ has the same distribution as the number of steps in the Kakutani model until all intervals have length at most $1/x$ . This quantity is described in more detail in Section 1.4 below; specifically, $P_{0,x}$ is $N_{1/x}$ defined at (1.14). In Theorem 1.10 below we give a Berry–Esseen theorem for $N_{t}$ as $t\to 0$ , and hence for $P_{0,x}$ as $x\to\infty$ .

For $\ell=1$ (the original Rényi model) a CLT for $P_{1,x}$ is due to Dvoretzky & Robbins [9], and a Berry–Esseen theorem for $P_{1,x}$ was obtained by Schreiber, Penrose & Yukich [30], as a special case of a much more general result, but with $\log$ factors in the rate that Theorem 1.10 below suggests one might be able to remove, in the one-dimensional case, by adapting the approach of the present paper. More recently, a refined general approach for Berry–Esseen bounds for functions of Poisson point processes, with presumably-optimal rates, has been given by [18], but, as far as we know, has yet to be successfully applied to random sequential adsorption.

1.3 Notation and background

To discuss the earlier work on the Kakutani model, and for later use in the present paper, we need some more notation. At time $n\in{\mathbb{Z}}_{+}:=\{0\}\cup{\mathbb{N}}$ (that is, after $n$ splitting events) the process is represented by the ordered collection of interval end-points

[TABLE]

The associated gap lengths at time $n\in{\mathbb{Z}}_{+}$ are

[TABLE]

The maximal and minimal gap lengths at time $n\in{\mathbb{Z}}_{+}$ are, respectively,

[TABLE]

The dynamics are driven by $U_{1},U_{2},\ldots$ , a sequence of independent ${\mathrm{Unif}[0,1]}$ random variables. Let $X_{1,1}:=U_{1}$ and then, recursively, given

[TABLE]

define $\ell_{n}:=\operatorname*{arg\mkern 1.0mu\max}_{1\leq i\leq n+1}L_{n,i}$ (which is uniquely defined). Then set

[TABLE]

The fact that the successive divisions are generated by continuous distributions ensures that, a.s., properties (1.6) and (1.9) persist for all $n\in{\mathbb{N}}$ .

Kakutani conjectured in a 1973 lecture, in response to a question from H. Araki (see [32, p. 341], [15, p. 571] and [1, p. 258]) that the empirical distribution ${\mathcal{E}}_{n}(x):=\frac{1}{n}\sum_{i=1}^{n}{\mathbbm{1}\mkern-1.5mu}{\{X_{n,i}\leq x\}}$ of endpoints is asymptotically uniform, or equidistributed, i.e.

[TABLE]

Kakutani [14] obtained an analogous result for a deterministic model in which the ${\mathrm{Unif}[0,1]}$ splitting distribution is substituted by a fixed parameter $\alpha\in(0,1)$ for the relative location of the division point. For the ${\mathrm{Unif}[0,1]}$ -splitting process, establishing (1.11) was posed as a challenge by R. Dudley in 1976 [6, p. 2443]. The result (1.11) was proved by van Zwet in [33] (submitted in February 1977) and, independently, using similar ideas, by Lootgieter [19, 20] (submitted later the same year). Van Zwet [33, p. 137] also acknowledges that Komlós and Tusnády were aware that a proof could be constructed by the same method. A paper of Slud [32], submitted in October 1976, asserted a proof of (1.11), via a rather different approach, but was found to contain an error; Slud later produced a correction. Results on empirical distributions extending (1.11) to a much wider class of interval-division schemes can be found in [22]; see also references therein.

Define, for $n\in{\mathbb{N}}$ and $y\in{\mathbb{R}}$ ,

[TABLE]

the distribution function of the empirical measure for the normalized gap lengths. Pyke’s uniform limit theorem [26, Thm. 1, p. 161] shows that

[TABLE]

which shows that a typical gap is approximately ${\mathrm{Unif}[0,2/n]}$ in distribution. While (1.13) says that there are at most $o(n)$ gaps of size bigger than $(2+\varepsilon)/n$ , $\varepsilon>0$ , Proposition 1.1 says that even the maximum gap will typically be only of order $n^{-3/2}$ different from $2/n$ .

1.4 Inversion and threshold times

Associated to the Kakutani process are the random times $N_{t}$ , $t\in(0,\infty)$ , defined by

[TABLE]

since $M_{0}=1$ , we have $N_{t}=0$ for all $t\geq 1$ . Moreover, since $\lim_{n\to\infty}M_{n}=0$ a.s. (which follows of course from (1.1), but also via a short elementary argument, as given by Kingman [16, p. 148]), we have $\operatorname{\mathbb{P}}(N_{t}<\infty\text{ for every }t>0)=1$ . The usefulness of $N_{t}$ as defined at (1.14) for analyzing $M_{n}$ is due, firstly, to the inversion relation

[TABLE]

and, secondly, a more readily accessible recursive structure. Indeed, by conditioning on the first split (through the variable $U_{1}$ ) we obtain the fundamental self-similarity relation

[TABLE]

In (1.16), the processes $N^{(1)}:=(N^{(1)}_{s})_{s\in(0,1)}$ and $N^{(2)}:=(N^{(2)}_{s})_{s\in(0,1)}$ are independent of $U_{1}$ and of each other, and each has the same distribution as $(N_{t})_{t\in(0,1)}$ . To see that (1.16) is true, observe that to reach time $N_{t}$ the intervals $[0,U_{1}]$ and $[U_{1},1]$ undergo independent Kakutani processes (and one can choose to execute all splittings on one side first, for example) but scaled by the relevant length factor, i.e., $U_{1}$ or $1-U_{1}$ . When the identification of $U_{1}$ in (1.16) with the generating sequence of the process is not relevant, we can write (1.16) in distributional form as

[TABLE]

where, on the right-hand side of (1.17), $U\sim{\mathrm{Unif}[0,1]}$ , and $U,N^{(1)},N^{(2)}$ are independent. The central role of the threshold times $N_{t}$ defined at (1.14) and their associated recursions (1.16)–(1.17) was already identified, independently, by van Zwet [33, p. 134] and Lootgieter [19, p. 404].

The CLT for $M_{n}$ , Proposition 1.1, was obtained by Pyke & van Zwet [27] via the inversion (1.15) from a corresponding CLT for $N_{t}$ .

Proposition 1.9 (Pyke & van Zwet, 2004).

Let $\sigma^{2}$ be as in Proposition 1.1. As $t\to 0$ , $\sqrt{t}\left(N_{t}-\frac{2}{t}\right)$ converges to the normal distribution with mean [math] and variance $\sigma^{2}/2$ .

In a similar way, we will obtain our quantitative CLT, Theorem 1.3, via an inversion of a corresponding Berry–Esseen result for $N_{t}$ .

Theorem 1.10.

There exists a constant $C\in{\mathbb{R}}_{+}$ such that, for all $t\in(0,\infty)$ ,

[TABLE]

where $\sigma^{2}$ is as defined in Proposition 1.1.

*Remark 1.11**.*

Proposition 1.9 is Corollary 3.3 in [27, p. 396]. By Kingman’s embedding (see Section 1.2), it can also be recovered from the earlier Theorem 2 of [31]. See also [13, pp. 435–6] for a framing in terms of general CLTs for the height of random fragmentation trees; we do not know of any Berry–Esseen results in that context.

1.5 Overview of the proofs and some further remarks

Overview of the proofs.

The main work of the paper is proving Theorem 1.10. Theorem 1.3 will be deduced from a careful (but not difficult) inversion of Theorem 1.10, and Theorem 1.5 from some results established in the course of the proof of Theorem 1.10. The proof of Theorem 1.10 is in part analytical, with some delicate estimates needed to obtain our presumably-optimal rates, and the overall structure is perhaps of interest more broadly, and is broken down into the following main steps.

•

The first step in the main line of the proof is to apply the classical Berry–Esseen theorem to prove (in Section 4) a conditional Berry–Esseen theorem (Proposition 4.1) for $N_{t}$ given the first $n$ steps of the process, where $4(n+1)t<1$ , ensuring that $0<N_{t}-n$ can be expressed as a sum of independent variables, using the basic self-similarly (1.17). Eventually, we will take $n\approx c_{1}/t$ for a small $c_{1}$ .

•

The centering and scaling quantities in the conditional Berry–Esseen bound are themselves random variables (functions of $U_{1},\ldots,U_{n}$ ), related to conditional means and variances, denoted by $R_{n,t}$ and $S_{n,t}$ defined at (4.2) and (4.7) below. To “uncondition” the bound needs detailed information about the joint distribution of $R_{n,t}$ and $S_{n,t}$ , summarized in Proposition 5.1 on their mixed moments. This is proved in Section 5, with groundwork laid in Sections 3 and 4 and making use of auxiliary results stated in Appendix A.

•

To study the joint distribution of $R_{n,t}$ and $S_{n,t}$ , we exploit the fact that both can be expressed as sum-type statistics of small gaps, that enjoy a crucial conditional independence structure which we clarify in Section 3. This study of small gaps will also lead to a short proof of Theorem 1.5, given in Section 3.2.

•

The “unconditioning” is achieved by a sort of Hermite–Edgeworth expansion, stated in Proposition 6.1. Combined with Proposition 5.1 to control the remaining error terms in the expansion leads to the proof of Theorem 1.10, given in Section 6.

•

To prepare for all of the above, we first collect some results (some known, some new, making use of ideas from [27]) on moments of $N_{t}$ and $M_{n}$ in Section 2.

Following the proofs of our main theorems, the proofs of Corollaries 1.7 and 1.8 are given in Section 6.

Comparison to the Dirichlet process and uniform spacings.

A natural comparator to the Kakutani problem is the Dirichlet partition of the interval generated by $n$ ${\mathrm{Unif}[0,1]}$ random variables, which can also be generated sequentially, like the Kakutani process, but one splits an interval chosen at random with probability proportional to its length (rather than always the longest). Denote the maximal spacing in the Dirichlet process after $n$ divisions by $M_{n}^{D}$ . A result of Lévy from 1939 shows that $nM_{n}^{D}-\log n$ has a Gumbel limit, and Slud [32] showed that $nM_{n}^{D}/\log n\to 1$ , a.s., as $n\to\infty$ .

Let $m_{n}^{D}$ denote the length of the smallest gap in the $n$ -division Dirichlet process. A direct calculation shows that $\operatorname{\mathbb{P}}(m_{n}^{D}\geq x)=(1-(n+1)x)^{n}$ , $x\in[0,\frac{1}{n+1}]$ , from which one can show

[TABLE]

and this bound is of the optimal order in $n$ . It follows from (1.19) that $n^{2}m_{n}^{D}$ converges to a unit-mean exponential, roughly half the smallest gap in the Kakutani process. Some intuition for this comes from observing that in the Dirichlet process, one is typically splitting a gap of length on average half the size of $M_{n}$ , the length split in the Kakutani process.

Other order statistics.

We expect that one can obtain some information about the length of near-maximal gaps, or near-minimal gaps, using our method and some extra work. It would be of interest to obtain results for more general order statistics of spacings, but it is not clear to us how to do this.

2 Means, variances, and moment bounds

In this section we study moments of the random variables $M_{n}$ and $N_{t}$ . For $t\in(0,\infty)$ , let $\mu(t):=\operatorname{\mathbb{E}}N_{t}$ and $v(t):=\operatorname{\mathbb{V}ar}N_{t}$ . Since $N_{t}=0$ , a.s., for $t\geq 1$ , we have $\mu(t)=v(t)=0$ for all $t\geq 1$ , so of interest is only $t\in(0,1)$ . The following exact results are known.

Proposition 2.1 (Lootgieter, 1977; van Zwet, 1978).

It holds that

[TABLE]

Moreover, with $s_{0}:=8\log 2-5\approx 0.545177$ , it holds that

[TABLE]

*Remark 2.2**.*

Proposition 2.1 is due, independently, to van Zwet [33] (submitted February 1977) and Lootgieter [20] (September 1977). Proposition 1.2 of [20, p. 396] covers both results for $\mu$ and $v$ , while [33] has the result for $\mu$ , and showed that $v(t)=v(1/2)/(2t)$ for $0<t\leq 1/2$ , but had not evaluated $v(1/2)=16\log 2-10$ . Both proofs go by an analysis of the recursion (1.17). The full result for $v$ was rediscovered by Pyke & van Zwet [27, p. 392]. Furthermore, $v(t)$ coincides with the quantity $v_{e}(1/t,0)$ in [31], which satisfies the same integral equation, and then (2.2) can be found in [31, pp. 75, 83].

To prepare for our later arguments, we build on analysis of [27] to state some estimates for (higher) moments of $N_{t}$ (Lemma 2.3) and for moments of $M_{n}$ (Lemma 2.4). The intuition in both cases is that the variables are concentrated about their respective means, namely $N_{t}\approx 2/t$ (for small $t$ ) and $M_{n}\approx 2/n$ (for large $n$ ). The following rough, but useful, upper bounds on the moments of $N_{t}$ are derived directly from results in [27].

Lemma 2.3.

For each $k\in{\mathbb{N}}$ there exists $C_{k}\in{\mathbb{R}}_{+}$ with $\operatorname{\mathbb{E}}(N_{t}^{k})\leq C_{k}t^{-k}$ , for all $0<t\leq 1$ .

Proof.

Let $k\in{\mathbb{N}}$ . It follows from (1.14) that $\operatorname{\mathbb{P}}(N_{t}=0)=1$ for $t\geq 1$ , and $\operatorname{\mathbb{P}}(1\leq N_{t}\leq N_{s}<\infty)=1$ for all $0<s\leq t<1$ , so that $\operatorname{\mathbb{E}}(N_{t}^{k})$ is non-increasing in $t>0$ . Lemma 2.1 of [27, p. 385] shows that, for every $t_{0}>0$ ,

[TABLE]

Moreover, Theorem 2.2 of [27, p. 386] and the algebra relating cumulants to moments [24, pp. 266–7] shows that there is a constant $C_{k}\in{\mathbb{R}}_{+}$ such that

[TABLE]

Combining (2.4) with the $t_{0}=1/k$ case of (2.3), we verify the statement in the lemma. ∎

Next are bounds on the moments of $M_{n}$ .

Lemma 2.4.

For each $k\in{\mathbb{N}}$ there exists $C^{\prime}_{k}\in{\mathbb{R}}_{+}$ with $\operatorname{\mathbb{E}}(M_{n}^{k})\leq C^{\prime}_{k}n^{-k}$ , for all $n\in{\mathbb{N}}$ .

Proof.

Let $k\in{\mathbb{N}}$ . By Lemma 2.3 and Markov’s inequality, for $0<t\leq 1$ and $n\in{\mathbb{N}}$ ,

[TABLE]

From the integration by parts formula for moments [10, p. 75], combined with (1.15) and the fact that $\operatorname{\mathbb{P}}(\frac{1}{n+1}\leq M_{n}\leq 1)=1$ for all $n\in{\mathbb{N}}$ , we obtain

[TABLE]

Hence from (2) and (2.5), for $n\in{\mathbb{N}}$ ,

[TABLE]

which yields the claimed bound, with $C^{\prime}_{k}:=1+2kC_{k+1}$ . ∎

We turn to centred moments of $M_{n}$ ; the intuition here, from the CLT in Proposition 1.1, is that $\sqrt{n}(nM_{n}-2)$ is tight. The precise statement that we need is the following, which exploits some further ideas from Pyke & van Zwet [27].

Lemma 2.5.

There is a constant $B_{0}<\infty$ such that

[TABLE]

Proof.

Since $(n+1)M_{n}\geq 1$ , a.s., it holds that $nM_{n}-2=(n+1)M_{n}-2-M_{n}$ satisfies

[TABLE]

By (2.8) we see that $|nM_{n}-2|\leq(n+2)M_{n}$ , a.s. Then, by Markov’s inequality and the fact that $\operatorname{\mathbb{E}}(M_{n}^{k})=O(n^{-k})$ from Lemma 2.4, we obtain

[TABLE]

Moreover, since $|nM_{n}-2|\leq n+2$ , a.s., it follows that

[TABLE]

Next we follow [27, pp. 400–1]. Let $n\in{\mathbb{N}}$ ; recall that $\operatorname{\mathbb{V}ar}N_{t}=v(t)$ and, from (2.1), that $\operatorname{\mathbb{E}}N_{t}=\mu(t)=\frac{2}{t}-1$ . From the inversion relation (1.15) and Chebyshev’s inequality,

[TABLE]

Similarly,

[TABLE]

In particular, taking $t=2n^{-1}+n^{-3/2}y$ , which, for $0\leq y\leq n^{2/3}$ has $t\in(0,1/2)$ for all $n>n_{0}:=6^{6/5}$ , the formula $v(t)=s_{0}/t$ (with $s_{0}=8\log 2-5$ ) from Proposition 2.1 yields

[TABLE]

Similarly, taking $t=2n^{-1}-n^{-3/2}y$ , for $0\leq y\leq 2\sqrt{n}$ , we get, for $n>4$ ,

[TABLE]

If $y\geq 2/\sqrt{n}$ and $n>4$ , then $\frac{n^{-1/2}y}{2-n^{-1/2}y}\geq\frac{1}{n-1}>\frac{1}{n}$ , so we obtain

[TABLE]

Summing (2) and (2.11), we conclude that, for all $n>n_{0}$ ,

[TABLE]

For a random variable $X\in{\mathbb{R}}_{+}$ and a constant $a\in(0,\infty)$ , we have (e.g. [10, p. 75])

[TABLE]

which, applied with $X=\sqrt{n}|nM_{n}-2|$ and $a=n^{2/3}$ , yields

[TABLE]

Combining the above bound with (2) completes the proof. ∎

The last result of this section is more technical in nature, concerning moments of harmonic sums of the $M_{n}$ ; it plays an important role in the sections below.

Corollary 2.6.

There is a constant $B_{1}<\infty$ such that, for all $k\in{\mathbb{N}}$ and all $n\in{\mathbb{N}}$ ,

[TABLE]

Proof.

Define $W_{n}:=\sum_{j=0}^{n-1}M^{-1}_{j}$ . Since $\inf_{0\leq i\leq n-1}M_{i}\geq M_{n-1}\geq 1/n$ , note that $W_{n}\leq n^{2}$ , a.s. Let $k\in{\mathbb{Z}}_{+}$ , and observe that, for every $n\in{\mathbb{N}}$ ,

[TABLE]

using the bounds $W_{n}\leq n^{2}$ and $M_{i}\geq 1/n$ . By Lemma 2.5, we thus obtain

[TABLE]

where $B_{1}:=2B_{0}+1<\infty$ . Using (2.13) and the triangle inequality,

[TABLE]

An induction on $k$ using the above relation, and the fact that $W_{n}^{0}=1$ , then shows that $\operatorname{\mathbb{E}}|W_{n}^{k}-(n^{2}/4)^{k}|\leq B_{1}kn^{2k-(1/2)}$ , for all $k,n\in{\mathbb{N}}$ . This yields (2.12). ∎

3 Small-gap statistics

3.1 Conditional independence structure and moments

To obtain Theorem 1.5 on the smallest gap, it is not surprising that we investigate the count of small gaps and use Poisson approximation. However, our approach to studying the fine fluctuations of the largest gap turns out to make essential use of more detailed information about small gaps, and the primary focus of this section is to present this detailed information. Later in this section we will then present the proof of Theorem 1.5, the main ingredient being Corollary 3.2 that we state shortly.

Of course a typical gap has length about $1/n$ , and $M_{n}$ is of the same order (about $2/n$ ; see (1.1) and Proposition 1.1); on the other hand, as Theorem 1.5 advertises, one expects to see small gaps all the way down to size around $1/n^{2}$ . The results in this section will give more information on small gaps, including those with lengths $o(1/n)$ .

For $g:[0,1]\to{\mathbb{R}}$ , $n\in{\mathbb{N}}$ , and $0\leq s<t\leq 1$ , define the statistic

[TABLE]

In the case where $g\equiv 1$ , then ${\mathcal{K}}^{g}_{n}$ is a counting function; we use the particular notation $K_{n}:={\mathcal{K}}^{g}_{n}$ , in that case. That is, for $n\in{\mathbb{N}}$ and $0\leq s<t\leq 1$ ,

[TABLE]

the number of gaps of size in $(s,t]$ . Write $K_{n,t}:=K_{n}(0,t]$ . Some intuition for these quantities is provided by Pyke’s uniform limit theorem (1.13), a consequence of which is that, for $g:[0,1]\to{\mathbb{R}}$ bounded and measurable, and $0\leq\alpha<\beta\leq 2$ ,

[TABLE]

Another consequence of (1.13) is

[TABLE]

There are also second-order (fluctuation) results that complement (3.3)–(3.4), provided by [27]. However, these results are targeted at typical gaps, and are of limited value concerning ${\mathcal{K}}^{g}_{n}(s,t]$ when $t\ll 1/n$ . Indeed, roughly speaking, the asymptotic (3.4) says that, $K_{n,t}=O(n^{2}t)+o(n)$ , a.s., but for $t\ll 1/n$ it is the $o(n)$ term that dominates.

The aim of this section is to provide a sharper study of small gaps, which will allow us to conclude, for example, that $K_{n,t}=O(n^{2}t)$ even when $nt\to 0$ (see Lemma 3.3 below). The additional structure we need is provided by the following important conditional independence result. In particular, representation (3.6) shows that $K_{n,t}$ can be represented as a sum of a random number of terms involving independent ${\mathrm{Unif}[0,1]}$ random variables.

Lemma 3.1.

Suppose that $0\leq s<t\leq 1$ . There exist random variables $\theta_{0},\theta_{1},\ldots$ and $\gamma_{0},\gamma_{1},\ldots$ , such that (i) $\gamma_{0},\gamma_{1},\ldots$ are i.i.d. ${\mathrm{Unif}[\frac{s}{t},1]}$ random variables, independent of the $\theta_{i}$ ; (ii) given $M_{0},M_{1},\ldots$ , the $\theta_{0},\theta_{1},\ldots\in\{0,1\}$ are independent with $\operatorname{\mathbb{P}}(\theta_{i}=1\mid M_{0},M_{1},\ldots)=2(t-s)/M_{i}$ ; (iii) for every $g:[0,1]\to{\mathbb{R}}$ and every $n\in{\mathbb{Z}}_{+}$ for which $2nt\leq 1$ , we have the representation

[TABLE]

In other words, for fixed $n\in{\mathbb{N}}$ and $0\leq s<t$ with $2nt\leq 1$ , we can write

[TABLE]

where $\upsilon_{1},\upsilon_{2},\ldots$ are i.i.d. ${\mathrm{Unif}[\frac{s}{t},1]}$ , independent of $K_{n}(s,t]$ .

Proof.

Fix $n\in{\mathbb{N}}$ and $t>0$ with $2nt\leq 1$ . Since $2(n+1)t\leq 1$ and $(i+1)M_{i}>1$ , a.s., we have $M_{i}>2t$ for all $i\in\{0,1,2,\ldots,n-1\}$ . Hence splitting the interval of length $M_{i}$ ( $0\leq i\leq n-1$ ) can never remove a gap of length in $(s,t]$ , and can create precisely zero or one gap of length in $(s,t]$ , and hence increase ${\mathcal{K}}^{g}$ according to

[TABLE]

recalling from (1.10) that $U_{i+1}\sim{\mathrm{Unif}[0,1]}$ is the relative location of the split point in the maximal interval. Note that, since $2nt\leq 1$ , we have $s<t<1/2$ , so that intervals $(s,t]$ and $[1-t,1-s)$ are disjoint, each of length $t-s$ . Moreover, conditional on $U_{i+1}M_{i}\in(s,t]$ , $U_{i+1}M_{i}/t$ has the ${\mathrm{Unif}[\frac{s}{t},1]}$ distribution; similarly for $(1-U_{i+1})M_{i}/t$ given $U_{i+1}M_{i}\in[1-t,1-s)$ . Thus we obtain the claimed representation (3.5) on setting

[TABLE]

for $V_{1},V_{2},\ldots$ a sequence of i.i.d. ${\mathrm{Unif}[\frac{s}{t},1]}$ random variables (merely to ensure that $\gamma_{i}$ has the correct distribution even if $\theta_{i}=0$ ).

The second expression (3.6) is obtained by ignoring the terms in (3.5) where $\theta_{i}=0$ and re-labelling so that $\upsilon_{j}:=\gamma_{h(j)}$ where $h(j):=\inf\{i\in{\mathbb{Z}}_{+}:\sum_{k=0}^{i}\theta_{k}=j\}$ . The number of non-zero terms is exactly $K_{n}(s,t]=\sum_{i=0}^{n-1}\theta_{i}$ , and the independence structure in (3.6) means that $K_{n}(s,t]$ is independent of the $\upsilon_{j}$ in (3.6). ∎

Taking $g\equiv 1$ in (3.5) gives the following useful fact.

Corollary 3.2.

Suppose that $n\in{\mathbb{N}}$ and $t>0$ satisfy $2nt\leq 1$ . Then

[TABLE]

where, given $M_{0},M_{1},M_{2},\ldots$ , the $\theta_{i}\in\{0,1\}$ are independent with $\operatorname{\mathbb{P}}(\theta_{i}=1\mid M_{i})=2t/M_{i}$ .

Corollary 3.2 is the basis for the proof of Theorem 1.5, which uses Poisson approximation and is presented later in this section. For Theorem 1.3 we need to further develop analysis of $K_{n,t}$ . The following result gives asymptotics for the moments of $K_{n,t}$ , which will be a key ingredient in the subsequent arguments. Part (i) gives an upper bound valid for a broad range of the parameters, part (ii) gives sharp asymptotics for a more restrictive range of parameters, and part (iii) gives a tail bound.

Lemma 3.3.

Suppose that $n\in{\mathbb{N}}$ . Then the following hold.

(i)

For every $t>0$ , $k\in{\mathbb{Z}}_{+}$ , and $n\in{\mathbb{N}}$ with $2nt\leq 1$ ,

[TABLE] 2. (ii)

Let $\nu\in(1,\frac{3}{2})$ . There exist constants $\delta>0$ and $B_{2}<\infty$ such that, for all $n,k\in{\mathbb{N}}$ ,

[TABLE] 3. (iii)

For every $t>0$ with $2nt\leq 1$ , we have $\operatorname{\mathbb{P}}\left(K_{n,t}\geq 6tn^{2}\right)\leq\exp\left(-2t^{2}n^{3}\right)$ .

Proof.

Recall the representation for $K_{n,t}$ from Corollary 3.2, and that $\operatorname{\mathbb{P}}(\theta_{i}=1\mid M_{i})=2t/M_{i}\leq 2nt$ for $i\in\{0,1,\ldots,n-1\}$ , since $M_{i}>\frac{1}{i+1}$ , a.s. Consequently, the moment generating function of $K_{n,t}$ is dominated by that of a ${\mathrm{Bin}(n,2nt)}$ random variable. Hence (see [2, §3]) we may apply the tail bound from Theorem 1 of [2], which yields part (i).

Next we prove part (ii). For $k\in{\mathbb{N}}$ let $I_{n,k}:=\{0,1,\ldots,n-1\}^{k}$ . By (3.7),

[TABLE]

Let $I^{\circ}_{n,k}\subset I_{n,k}$ be the set of all $(i_{1},\ldots,i_{k})\in I_{n,k}$ for which the $k$ coordinates are distinct. Then $I_{n,k}$ contains $n^{k}$ elements, while $I^{\circ}_{n,k}$ contains $\frac{n!}{(n-k)!}$ elements. For $k\geq 2$ , every element of $I_{n,k}\setminus I^{\circ}_{n,k}$ contains at least one pair of the $k$ coordinates that match, so

[TABLE]

when $k=1$ , clearly $I_{n,1}\setminus I^{\circ}_{n,1}=\emptyset$ . Now, from (3.10),

[TABLE]

For the first term on the right-hand side of (3.12), by conditional independence,

[TABLE]

We bound the error between the sum on the right of (3.13) and the quantity

[TABLE]

from Corollary 2.6. Indeed, using (3.11) and the fact that $M_{i}>1/n$ for all $0\leq i\leq n-1$ ,

[TABLE]

Thus, taking expectations in (3.13), we obtain

[TABLE]

Then from Corollary 2.6, there is a $C<\infty$ (depending on $B_{1}$ ) such that, for all $n,k\in{\mathbb{N}}$ ,

[TABLE]

Fix $\delta>0$ with $4^{\delta}\leq{\mathrm{e}}^{(3/4)-(\nu/2)}$ (recall that $1<\nu<3/2$ ). Then $4^{\delta\log n}\leq n^{(3/4)-(\nu/2)}$ , and

[TABLE]

So we conclude that, for a constant $C<\infty$ , for all $n\in{\mathbb{N}}$ ,

[TABLE]

Similarly to (3.11), using the fact that the $\theta_{i}$ are $\{0,1\}$ -valued,

[TABLE]

using the $k-1$ cases of (3.10) and (3.8). Since $t\geq n^{-\nu}$ for $\nu<3/2$ , and $k=O(\log n)$ , we have that $k^{2}/(n^{2}t)$ is uniformly bounded. Hence, combining the preceding display and (3.15) with (3.12), we obtain, for some $C<\infty$ and all $n,k\in{\mathbb{N}}$ with $k\leq\delta\log n$ ,

[TABLE]

Since $n^{2}t\geq n^{2-\nu}$ , another application of (3.14) yields (3.9), completing the proof of (ii).

Finally, we prove (iii). Let ${\mathcal{F}}_{n}:=\sigma(U_{1},\ldots,U_{n})$ be the $\sigma$ -algebra generated by the first $n$ divisions; ${\mathcal{F}}_{0}$ is the trivial $\sigma$ -algebra. Fix $n\in{\mathbb{N}}$ with $2nt\leq 1$ . Note $\operatorname{\mathbb{E}}(K_{m+1,t}-K_{m,t}\mid{\mathcal{F}}_{m})=\operatorname{\mathbb{E}}(\theta_{m}\mid{\mathcal{F}}_{m})=2t/M_{m}$ by (3.7). Let $A_{0,t}:=0$ and, for $m\in{\mathbb{N}}$ , $A_{m,t}:=2t\sum_{j=0}^{m-1}1/M_{j},$ and set $X_{m,t}:=K_{m,t}-A_{m,t}$ for $m\in{\mathbb{Z}}_{+}$ . Then $X_{m,t}$ is ${\mathcal{F}}_{m}$ -measurable, $A_{m+1,t}-A_{m,t}=2t/M_{m}$ , and, provided $m\leq n$ ,

[TABLE]

Thus $X_{0,t},X_{1,t},\ldots,X_{n,t}$ is a martingale. Moreover, $0\leq K_{m+1,t}-K_{m,t}\leq 1$ and $0\leq A_{m+1}-A_{m,t}\leq 2(m+1)t\leq 1$ for $m\leq n-1$ , and so $\sup_{0\leq m\leq n-1}|X_{m+1,t}-X_{m,t}|\leq 1$ , a.s. We apply a one-sided Azuma–Hoeffding inequality (see e.g. [23, p. 46]) to obtain, for all $a\in{\mathbb{R}}_{+}$ , $\operatorname{\mathbb{P}}(|X_{n,t}-X_{0,t}|\geq a)\leq\exp(-a^{2}/(2n))$ . Since $A_{n,t}\leq 2t\sum_{i=0}^{n-1}(i+1)\leq 4tn^{2}$ , a.s.,

[TABLE]

which yields the tail bound in part (iii). ∎

3.2 Limit theorem for the smallest gap

In this section we use some standard Poisson approximation bounds, the representation given in Corollary 3.2 for counts $K_{n,t}$ of small gaps, defined at (3.2), and the reciprocal moments bounds in Corollary 2.6, to give a proof of Theorem 1.5 on the asymptotics of the smallest gap, $m_{n}=\min_{1\leq i\leq n+1}L_{n,i}$ defined at (1.8).

Recall from Corollary 3.2 that $K_{n,t}=\sum_{i=0}^{n-1}\theta_{i}$ , where the $\theta_{i}$ are supported on $\{0,1\}$ , are conditionally independent given $M_{0},M_{1},\ldots$ , and satisfy $\operatorname{\mathbb{P}}(\theta_{i}=1\mid M_{0},M_{1},\ldots)=2t/M_{i}$ . Throughout this section we let $C$ denote a positive, finite constant which is independent of $n$ and $t$ and whose value may vary from line to line.

Proof of Theorem 1.5.

For non-negative, integer-valued random variables $K$ and $Y$ , the corresponding total variation distance is denoted by

[TABLE]

Recall a classic bound of Le Cam [8] (see also [4, p. 3]): letting $I_{1},\ldots,I_{n}$ be independent Bernoulli random variables with $\operatorname{\mathbb{E}}I_{i}=p_{i}$ , the total variation distance (denoted below by $d_{\text{TV}}$ ) between $\sum_{i=1}^{n}I_{i}$ and a $\text{Pois}(\sum_{i=1}^{n}p_{i})$ random variable is bounded by $4.5\max_{1\leq i\leq n}p_{i}$ . Let $Y\sim\text{MP}(2t\sum_{i=0}^{n-1}M_{i}^{-1})$ have a mixed Poisson distribution; that is, conditional on $2t\sum_{i=0}^{n-1}M_{i}^{-1}$ , the random variable $Y$ has a Poisson distribution with this parameter. A conditioning argument combined with Le Cam’s result says that

[TABLE]

using the fact that $M_{i}>\frac{1}{1+i}$ , a.s., for all $i\in{\mathbb{Z}}_{+}$ . We may now approximate $Y$ by $Z\sim\text{Pois}(\frac{1}{2}n^{2}t)$ . By Theorem 1.C(i) of [4] we have that

[TABLE]

for some $C$ , where the final inequality follows from Corollary 2.6. Hence, by the triangle inequality there exists $C$ such that

[TABLE]

Then, we choose $t=\frac{2\theta}{n^{2}}$ for some $\theta>0$ and note that

[TABLE]

to obtain that there exists $C$ such that

[TABLE]

which immediately gives us that

[TABLE]

for any $x\leq 1+\log n$ . For $x>1+\log n$ we write

[TABLE]

By (3.16), the first term in this final maximum is at most $\frac{C(1+\log n)}{\sqrt{n}}+\frac{1}{{\mathrm{e}}n}$ , and thus (3.17) also holds for these values of $x$ and for a suitable choice of $C$ . ∎

4 Conditional Berry–Esseen bounds

The starting point of our proof of Theorem 1.10 is a decomposition of $N_{t}$ into a sum of independent, self-similar contributions, obtained by considering the evolution of the process subsequent to time $n$ . Fix $n\in{\mathbb{Z}}_{+}$ and $t\in(0,\frac{1}{n+1})$ . Since $\operatorname{\mathbb{P}}(M_{n}\geq\frac{1}{n+1}>t)=1$ , $\operatorname{\mathbb{P}}(N_{t}>n)=1$ . Extending (1.16) gives the representation, for $(N^{(i)}_{t})_{t>0}$ independent copies of $(N_{t})_{t>0}$ , independent of gap lengths $L_{n,1},\ldots,L_{n,n+1}$ (recall that $\sum_{i=1}^{n+1}L_{n,i}=1$ ),

[TABLE]

see e.g. Proposition 1.1 of [19] or [20]. As a starting-point for proving (non-quantitative) CLTs, there is some similarity between (4.1) and the approach of Dvoretzky & Robbins [9] in their proof of the CLT for Rényi’s parking model (see also Section 1.2).

Recall that ${\mathcal{F}}_{n}=\sigma(U_{1},\ldots,U_{n})$ defines the filtration to which the Kakutani process is adapted. For $n\in{\mathbb{Z}}_{+}$ and $t\in(0,\frac{1}{n+1})$ , define

[TABLE]

We will use the classical Berry–Esseen theorem to obtain the following conditional Berry–Esseen estimate; note that in (4.4) not only is the probability conditional on ${\mathcal{F}}_{n}$ , but so are the centering and scaling quantities $R_{n,t}$ and $V_{n,t}$ .

Lemma 4.1.

There is a constant $C\in{\mathbb{R}}_{+}$ such that, for all $n\in{\mathbb{N}}$ and all $t\in(0,\frac{1}{4(n+1)})$ ,

[TABLE]

Proof.

Fix $n\in{\mathbb{Z}}_{+}$ and $t\in(0,\frac{1}{4(n+1)})$ . Conditional on ${\mathcal{F}}_{n}$ , the summands in the expression given in (4.1) for $Y_{n,t}$ are independent (although not identically distributed). Denoting

[TABLE]

the Berry–Esseen theorem for sums of independent random variables with finite third moments (see Theorem 7.6.2 of [10, p. 356]) yields, for an absolute constant $C\in{\mathbb{R}}_{+}$ ,

[TABLE]

Using the elementary inequality $|a-b|^{3}\leq|a|^{3}+|b|^{3}$ , $a,b\in{\mathbb{R}}_{+}$ , we have

[TABLE]

for constant $C<\infty$ , from (2.1) and Lemma 2.3. Since $\sum_{i=1}^{n+1}L_{n,i}=1$ , it follows that

[TABLE]

On the other hand, by (2.2), provided that $t\in(0,\frac{1}{4(n+1)})$ ,

[TABLE]

Using (4.6) and the preceding bound for $\gamma_{n,t}(i)$ in (4.5) yields (4.4). ∎

To deduce Theorem 1.10 starting from Lemma 4.1, we need to examine the quantities $R_{n,t}$ and $V_{n,t}$ that appear as centering and scaling in (4.4). To do so, we define

[TABLE]

where $V_{n,t}$ is defined at (4.3) and $v(t)=\operatorname{\mathbb{V}ar}N_{t}$ is given by (2.2). A significant part of the remaining technical work of the paper is to obtain good asymptotic estimates for mixed moments of $R_{n,t}$ and $S_{n,t}$ (see Section 5). To facilitate this we derive, in the rest of the present section, basic properties of $R_{n,t}$ and $S_{n,t}$ , and crucial representations for $R_{n,t}$ and $S_{n,t}$ in terms of small-gap statistics as described in Section 3.

Lemma 4.2.

Suppose that $n\in{\mathbb{Z}}_{+}$ and $t\in(0,\frac{1}{n+1})$ , and define $R_{n,t}$ and $S_{n,t}$ by (4.2) and (4.7). Then $\operatorname{\mathbb{E}}R_{n,t}=0$ , and the following hold:

[TABLE]

Proof.

Clearly, $\operatorname{\mathbb{E}}R_{n,t}=0$ by (4.2). Since $Y_{n,t}=N_{t}-n$ by (4.1), for $k\leq n$ , $\operatorname{\mathbb{V}ar}(Y_{n,t}\mid{\mathcal{F}}_{k})=\operatorname{\mathbb{V}ar}(N_{t}\mid{\mathcal{F}}_{k})$ , and hence, by (4.7) and the fact that $\operatorname{\mathbb{V}ar}Y_{n,t}=v(t)$ ,

[TABLE]

By the (conditional) total variance formula, using (4.2) and (4.7),

[TABLE]

Comparison with (4.10) yields (4.8). Finally, the $k=0$ case of (4.8) yields (4.9). ∎

Recall from (2.2) that $v(t)=\operatorname{\mathbb{V}ar}N_{t}=s_{0}/t$ , $t\in(0,1/2)$ , where $s_{0}=8\log 2-5$ . Set

[TABLE]

The next result includes a representation for $S_{n,t}$ via two sum statistics of the form (3.1).

Lemma 4.3.

Suppose that $n\in{\mathbb{N}}$ and $t\in(0,\frac{1}{n+1})$ . Then, with $w$ defined at (4.11),

[TABLE]

Moreover, with $K_{n,t}$ defined at (3.2), whenever $t\in(0,\frac{1}{4(n+1)})$ it holds that

[TABLE]

Proof.

From Proposition 2.1 and the definition of $w$ from (4.11), we see that

[TABLE]

The function $w$ as defined in (4.11) satisfies $\sup_{0\leq t\leq 1}|w(t)|=\lim_{t\to 1-}|w(t)|=s_{0}$ , and so

[TABLE]

Now let $t\in(0,\frac{1}{n+1})$ . We have from (4.1) and conditional independence that

[TABLE]

Hence, from (4.7) and (4.14), since $t<\frac{1}{n+1}\leq 1/2$ ,

[TABLE]

which yields (4.12). If also $t<\frac{1}{4(n+1)}$ , we may apply (4.6) in (4.16) to obtain $S_{n,t}\leq s_{0}/(2t)$ , giving the first bound in (4.13). By (4.12), (4.15), and (3.2) we get $|S_{n,t}|\leq s_{0}K_{n}(0,t]+s_{0}K_{n}(t,2t]=s_{0}K_{n,2t}$ , which gives the second bound in (4.13). ∎

*Remark 4.4**.*

Consider $t=\theta/n$ , so $n^{2}t=n\theta$ . From (3.3), it follows that, for $\theta\in(0,1)$ ,

[TABLE]

which, using the formula from (4.11) to evaluate the integral, takes the (negative) value

[TABLE]

Thus (4.12) says that we should expect $S_{n,t}$ to be genuinely of order $n^{2}t$ .

Next, we show that $R_{n,t}$ can be represented as a sum statistic of the form (3.1).

Lemma 4.5.

Let $n\in{\mathbb{N}}$ and take $t\in(0,1)$ . Then $R_{n,t}$ defined by (4.2) satisfies

[TABLE]

for ${\mathcal{K}}^{g}$ , $g:[0,1]\to[-1,1]$ given by (3.1) with $g(u)=1-2u$ , and $K_{n,t}$ defined at (3.2).

Proof.

Taking conditional expectations in (4.1) and using (2.1), we obtain

[TABLE]

using $\sum_{i=1}^{n+1}L_{n,i}=1$ and $t\in(0,1)$ . Thus for $g(u)=1-2u$ we identify from (3.1) that $R_{n,t}={\mathcal{K}}_{n}^{g}(0,t]$ , and since $|g(u)|\leq 1$ , we verify (4.17) using (3.2). ∎

*Remark 4.6**.*

The bound $|R_{n,t}|\leq K_{n,t}$ from (4.17) shows that $|R_{n,t}|=O(n^{2}t)$ with high probability. This bound is poor, since (4.9) and Lemma 4.3 say $\operatorname{\mathbb{V}ar}R_{n,t}=\operatorname{\mathbb{E}}S_{n,t}=O(n^{2}t)$ , and $\operatorname{\mathbb{E}}R_{n,t}=0$ , so one expects $|R_{n,t}|$ to be around $O(nt^{1/2})$ . Indeed, if $\theta\in(0,1)$ , the fluctuation results of Pyke & van Zwet (Theorem 6.2 of [27]) show that $n^{-1/2}R_{n,\theta/n}$ has a Gaussian limit. However, when $nt\to 0$ this result says only that $n^{-1/2}R_{n,t}\to 0$ in probability. Proposition 5.1 below includes moments asymptotics $\operatorname{\mathbb{E}}(R_{n,t}^{p})$ that address these points, giving finer control on the asymptotics of $R_{n,t}$ for a broader range of $t$ .

5 Conditional means, variances, and their moments

The aim of this section is to establish the following asymptotics on the mixed moments of $R_{n,t}$ and $S_{n,t}$ defined at (4.2) and (4.7) respectively. The result is in two parts, depending on the parity of the exponent of $R_{n,t}$ ; recall that $\operatorname{\mathbb{E}}R_{n,t}=0$ .

Proposition 5.1.

Suppose that $\nu\in(1,\frac{3}{2})$ . Then the following hold:

(i)

There exist constants $C<\infty$ and $\delta>0$ such that, for all $n\in{\mathbb{N}}$ , all $t\in(n^{-\nu},\frac{1}{4(n+1)})$ , and all $p\in 2{\mathbb{Z}}_{+}$ and $q\in{\mathbb{Z}}_{+}$ with $1\leq p+q\leq\delta\log n$ ,

[TABLE] 2. (ii)

There exist constants $C<\infty$ and $\delta>0$ such that, for all $n\in{\mathbb{N}}$ , all $t\in(n^{-\nu},\frac{1}{4(n+1)})$ , and all $p-1\in 2{\mathbb{Z}}_{+}$ and $q\in{\mathbb{Z}}_{+}$ with $1\leq p+q\leq\delta\log n$ ,

[TABLE]

We give the proof of Proposition 5.1 later in this section. Lemma 3.1, on statistics of small gaps, combined with Lemmas 4.5 and 4.3 which represent, respectively, $R_{n,t}$ and $S_{n,t}$ in terms of small-gap functionals, enables us to represent $R_{n,t}$ and $S_{n,t}$ in the form

[TABLE]

where $u_{i}\sim{\mathrm{Unif}[-1,1]}$ , $v_{j}\sim{\mathrm{Unif}[1,2]}$ , and, conditional on the $K_{n}(0,t]$ and $K_{n}(t,2t]$ , the random variables $u_{1},u_{2},\ldots$ and $v_{1},v_{2},\ldots$ are all mutually independent. Write

[TABLE]

Then (5.3) is equivalent to

[TABLE]

The expressions for the moments of $K_{n,t}$ from Lemma 3.3, with the representation (5.5) and the associated conditional independence structure, made explicit in Lemma 5.3 below, is our starting point for the proof of Proposition 5.1.

*Remark 5.2**.*

With $w$ as defined at (4.11), and $v_{1}\sim{\mathrm{Unif}[1,2]}$ , some calculus shows that the expectation of the each summand appearing in (5.4) is

[TABLE]

which, by comparison with the formula for $s_{0}$ from Proposition 2.1, shows that

[TABLE]

Thus from the representation (5.5) we confirm that $\operatorname{\mathbb{E}}R_{n,t}=0$ (as is clear from (4.2)) and

[TABLE]

Moreover, it also follows from (5.5) that

[TABLE]

so the relation (5.7) recovers $\operatorname{\mathbb{E}}S_{n,t}=\operatorname{\mathbb{V}ar}R_{n,t}$ , as at (4.9).

The next result summarizes the structure of the components of $S_{n,t}$ expressed in (5.5).

Lemma 5.3.

Let $n\in{\mathbb{N}}$ and $t\in(0,\frac{1}{2(n+1)})$ . Conditional on $K_{n,2t}=k\in{\mathbb{Z}}_{+}$ , the random variables $K_{n,t}$ , $R_{n,t}$ , $W_{n,t}$ have the representation

[TABLE]

where $K\sim{\mathrm{Bin}(k,1/2)}$ , the $u_{i}\sim{\mathrm{Unif}[-1,1]}$ , and $v_{j}\sim{\mathrm{Unif}[1,2]}$ are all independent.

Proof.

The representation for $R_{n,t}$ from (5.5) (coming from Lemma 4.5 and Lemma 3.1) together with the definition of $W_{n,t}$ from (5.4) gives

[TABLE]

where, given $K_{n}(0,t]$ and $K_{n}(t,2t]$ , the $u_{i},v_{j}$ are all mutually independent with $u_{i}\sim{\mathrm{Unif}[-1,1]}$ and $v_{j}\sim{\mathrm{Unif}[1,2]}$ . By (3.2), $K_{n}(t,2t]=K_{n}(0,2t]-K_{n}(0,t]$ and, with the notation from (3.1), $K_{n}(0,t]={\mathcal{K}}_{n}^{g}(0,2t]$ with $g(u):={\mathbbm{1}\mkern-1.5mu}{\{u\leq 1/2\}}$ . Hence Lemma 3.1 shows that, given $K_{n}(0,2t]=k$ , $K_{n}(0,t]\sim{\mathrm{Bin}(k,1/2)}$ . ∎

We will use Lemma 5.3 to obtain, in Lemma 5.4 below, estimates for the mixed moments of $K_{n,t}$ , $R_{n,t}$ , and $W_{n,t}$ . In the proof, we will make use of auxiliary results stated in Appendix A below, including estimates of moments of random sums, like those appearing in the triple representation in Lemma 5.3, given in Lemma A.1.

Lemma 5.4.

(i)

Suppose that $n\in{\mathbb{N}}$ and $t\in(0,\frac{1}{2(n+1)})$ . Then, for $a,b,c\in{\mathbb{Z}}_{+}$ ,

[TABLE] 2. (ii)

There is a constant $C<\infty$ such that for all $n\in{\mathbb{N}}$ , all $t\in(0,\frac{1}{2(n+1)})$ , and all $a,b,c\in{\mathbb{Z}}_{+}$ with $(a+b+c)^{2}\leq n^{2}t$ ,

[TABLE] 3. (iii)

Let $\nu\in(1,\frac{3}{2})$ . There are constants $\delta>0$ and $C<\infty$ such that, for all $n\in{\mathbb{N}}$ , for all $t\in(n^{-\nu},\frac{1}{2(n+1)})$ , and for all $a,b,c\in{\mathbb{Z}}_{+}$ with $1\leq a+b+c\leq\delta\log n$ ,

[TABLE]

In the following proof, and frequently later on, we will need simple inequalities relating $(2n)!$ and $n!$ derived from

[TABLE]

Proof of Lemma 5.4.

Lemma 5.3 shows that $(K_{n,t},R_{n,t},W_{n,t})$ and $(K_{n,t},-R_{n,t},W_{n,t})$ have the same distribution, and this yields (5.8), and hence proves part (i).

For parts (ii)–(iii), denote, similarly to Lemma A.1, the moments

[TABLE]

for independent sequences $u_{i}\sim{\mathrm{Unif}[-1,1]}$ and $w_{j}:=-w(1/v_{j})$ , $v_{j}\sim{\mathrm{Unif}[1,2]}$ . By Lemma 5.3,

[TABLE]

Since $\sup_{0\leq t\leq 1}|w(t)|=s_{0}$ , $|W_{n,t}|\leq s_{0}K_{n,2t}$ by (5.4), and $K_{n,t}\leq K_{n,2t}$ by (3.2), so

[TABLE]

since $s_{0}\in(0,1)$ . By Lemma A.1 (i) applied with $\xi\sim{\mathrm{Unif}[-1,1]}$ , noting that $\operatorname{\mathbb{E}}(\xi^{r})=\frac{1}{1+r}$ for $r\in{\mathbb{Z}}_{+}$ , there is a constant $C<\infty$ such that, for all $b\in{\mathbb{Z}}_{+}$ and all $n\in{\mathbb{N}}$ ,

[TABLE]

by (5.11). From (5.12)–(5.13) there is $C<\infty$ such that, for all $n\in{\mathbb{N}}$ and all $a,b,c\in{\mathbb{Z}}_{+}$ ,

[TABLE]

Now using Lemma 3.3 (i), we verify part (ii).

Finally, for part (iii), suppose $a+b+c\geq 1$ . From Lemma 5.3, with $K\sim{\mathrm{Bin}(k,1/2)}$ ,

[TABLE]

Here, by Lemma A.1 (ii) applied with $\xi=-w(1/v)$ , $v\sim{\mathrm{Unif}[1,2]}$ , and the fact that $|w_{j}|\leq s_{0}<3/5$ and $\operatorname{\mathbb{E}}\xi=\gamma\in(0,3/5)$ , by (5.6), for all $c\in{\mathbb{Z}}_{+}$ and all $n\in{\mathbb{N}}$ ,

[TABLE]

Combining this bound with (5.13), using $pq-rs=r(q-s)+s(p-r)+(p-r)(q-s)$ , we obtain, for some $C<\infty$ and all $b,c\in{\mathbb{Z}}_{+}$ and all $n,m\in{\mathbb{N}}$ ,

[TABLE]

Using the bound (5.15) in (5.14), we get, for $K\sim{\mathrm{Bin}(k,1/2)}$ ,

[TABLE]

by an application of Lemma A.2. Another application of Lemma A.2 shows that

[TABLE]

It follows that

[TABLE]

by Lemma 3.3 (i), provided that $(a+b+c)^{2}\leq n^{2}t$ . For $\nu\in(1,\frac{3}{2})$ , take $\delta>0$ as in Lemma 3.3 (ii). Assuming that $t\geq n^{-\nu}$ and $a+b+c\leq\delta\log n$ , we have that, indeed, $(a+b+c)^{2}\leq n^{2}t$ for all $n$ large enough. Moreover, applying Lemma 3.3 (ii) to estimate $\operatorname{\mathbb{E}}(K_{n,2t}^{a+b+c})$ , we obtain (5.10), noting that $n^{2}t<n$ , so that the $n^{-1/2}$ term in the error coming from that lemma is negligible compared to $(n^{2}t)^{-1/2}$ . This proves part (iii). ∎

Proof of Proposition 5.1.

Let $\nu\in(1,\frac{3}{2})$ and $\delta>0$ as in Lemma 5.4 (iii). Suppose $p,q\in{\mathbb{Z}}_{+}$ with $1\leq p+q\leq\delta\log n$ . We can use (5.5) and a trinomial expansion to write

[TABLE]

where $W_{n,t}$ is defined at (5.4), and the sum is over $a,b,c\in{\mathbb{Z}}_{+}$ whose sum is equal to $q$ . Taking expectations in (5.16) and applying (5.8), we obtain

[TABLE]

Provided $t\in(n^{-\nu},\frac{1}{2(n+1)})$ , we have $n^{2}t\geq n^{2-\nu}$ and so $a+\frac{p+b}{2}+c=q+\frac{p}{2}-\frac{b}{2}\leq p+q$ where $(p+q)^{2}\leq\delta^{2}(\log n)^{2}\leq n^{2}t$ for all but finitely many $n\in{\mathbb{N}}$ . Hence the hypotheses of parts (ii) and (iii) of Lemma 5.4 are both satisfied when considering $\operatorname{\mathbb{E}}(K_{n,t}^{a}R_{n,t}^{p+b}W_{n,t}^{c})$ .

First suppose that $p\in 2{\mathbb{Z}}_{+}$ ; note that $q+\frac{p}{2}\geq 1$ in this case. In the sum in (5.17), we show that the terms with $b=0$ are dominant. To this end, consider

[TABLE]

using the upper bound from Lemma 5.4 (ii). Since $s_{0}/2<1$ , using (5.11),

[TABLE]

since $(x+y)!\leq(x+y)^{y}x!$ for $x,y\in{\mathbb{Z}}_{+}$ . Now

[TABLE]

using the fact that $(p+q)^{2}\leq n^{2}t$ . Thus, for some $C<\infty$ and all $p,q$ with $p+q\leq\delta\log n$ ,

[TABLE]

On the other hand, from Lemma 5.4 (iii) it follows that

[TABLE]

Furthermore,

[TABLE]

using the relation (5.7) between $s_{0}$ and $\gamma$ . Combined with (5.18), we verify (5.1).

Finally, suppose that $p-1\in 2{\mathbb{Z}}_{+}$ . Then if $q=0$ we have $\operatorname{\mathbb{E}}(R_{n,t}^{p})=0$ and (5.2) is trivial, so suppose that $q\in{\mathbb{N}}$ ; note that $q+\frac{p}{2}\geq\frac{3}{2}$ in this case. In the sum in (5.17), we show that now the terms with $b=1$ are dominant. Now, by (5.17),

[TABLE]

using the upper bound from Lemma 5.4 (ii). Now following a similar argument to that leading to (5.18) we obtain, for some $C<\infty$ and all $p,q$ with $p+q\leq\delta\log n$ ,

[TABLE]

On the other hand, from Lemma 5.4 (iii) it follows that

[TABLE]

Furthermore,

[TABLE]

using (5.7). Combined with (5.19), we verify (5.2). ∎

6 Completing the proofs of the main theorems

In this section we combine the ingredients developed so far with an expansion in Hermite polynomials to prove our quantitative CLTs for $N_{t}$ (Theorem 1.10) and $M_{n}$ (Theorem 1.3). A key intermediate result is Proposition 6.1 below. Recall the definitions of $v(t)$ , $R_{n,t}$ and $S_{n,t}$ from (2.2), (4.2) and (4.7), respectively, and for $\ell\in{\mathbb{Z}}_{+}$ define

[TABLE]

The following result reduces the proof of Theorem 1.10 to controlling (sums of) the quantities ${\mathcal{E}}^{\ell}_{n,t}$ from (6.1). Let $\Gamma$ denote the Euler gamma function, so $\Gamma(x+1)=x!$ , $x\in{\mathbb{Z}}_{+}$ .

Proposition 6.1.

Let $\nu\in(1,\frac{3}{2})$ , and let $\delta>0$ be as in Proposition 5.1. Then there exist constants $c_{0}>0$ and $C<\infty$ such that, for all $n\in{\mathbb{N}}$ and $t>0$ such that $n^{1-\nu}\leq nt\leq c_{0}$ ,

[TABLE]

The main additional element to Proposition 6.1 is a sort of Hermite–Edgeworth expansion. Denote by $H_{n}$ the Hermite polynomial of degree $n\in{\mathbb{Z}}_{+}$ , which satisfies

[TABLE]

where $\phi(x):=(2\pi)^{-1/2}{\mathrm{e}}^{-x^{2}/2}$ is the standard Gaussian density: see e.g. [29, §20.2]. We will need the following inequality from [21, p. 78]:

[TABLE]

note that $|H_{2n}(0)|=2^{-n}(2n)!/n!$ shows this bound is not far from optimal.

Lemma 6.2.

Let $m\in{\mathbb{N}}$ . Then, for all $z\in{\mathbb{R}}$ and all $\alpha\in(-\infty,1/2]$ ,

[TABLE]

Proof.

For $z\in{\mathbb{R}}$ and $\alpha\in(-\infty,1/2]$ , let $Y\sim{\mathcal{N}}(-z,1-\alpha)$ ; then

[TABLE]

Let $\operatorname{{\mathrm{Re}}}\xi$ denote the real part of $\xi\in\mathbb{C}$ . The random variable $Y$ has characteristic function

[TABLE]

Using the Taylor series for the exponential function with complex argument

[TABLE]

we obtain, for $t\in{\mathbb{R}}$ ,

[TABLE]

A standard inversion formula (see e.g. Theorem 3.2.1 of [21, p. 31]) gives

[TABLE]

Using the same formula for the standard Gaussian characteristic function shows that

[TABLE]

Hence, by (6.5),

[TABLE]

provided $\alpha\in(-\infty,1/2]$ . Now, using the binomial theorem to expand $g_{z,\alpha}(t)^{j}$ ,

[TABLE]

For $n\in{\mathbb{Z}}_{+}$ we have the equality

[TABLE]

Hence,

[TABLE]

So we obtain

[TABLE]

Thus we conclude that

[TABLE]

The stated result now follows by re-expressing the sums over $k$ and $\ell=j+k-1$ . ∎

Proof of Proposition 6.1.

Let $n\in{\mathbb{N}}$ and $t\in(0,\frac{1}{4(n+1)})$ . Recall $R_{n,t}$ and $V_{n,t}$ from (4.2) and (4.3). We apply the conditional Berry–Esseen result, Lemma 4.1, which has random centering and scaling, to obtain, for deterministic centering and scaling,

[TABLE]

for ${\mathcal{F}}_{n}$ -measurable random variables $\Delta_{n,t}(x)$ satisfying $\sup_{x\in{\mathbb{R}}}|\Delta_{n,t}(x)|\leq CM_{n}^{2}t^{-3/2}$ , a.s., by (4.4). Recall that $\operatorname{\mathbb{V}ar}N_{t}=v(t)=s_{0}/t$ for $t\in(0,1/2)$ , by (2.2). Define

[TABLE]

where $S_{n,t}$ is defined at (4.7). Thus we can re-write the preceding bound as

[TABLE]

Note that, by (6.6) with the first inequality in (4.13) and our choice $t<\frac{1}{4(n+1)}$ , we have $\alpha_{n,t}=S_{n,t}/v(t)\leq 1/2$ . Moreover, for $\ell\in{\mathbb{Z}}_{+}$ we may express ${\mathcal{E}}^{\ell}_{n,t}$ as defined at (6.1) in terms of $\alpha_{n,t}$ and $Z_{n,t}$ defined at (6.6) via

[TABLE]

Then we apply Lemma 6.2 with $\alpha=\alpha_{n,t}\in(-\infty,1/2]$ and $z=Z_{n,t}\in{\mathbb{R}}$ to give

[TABLE]

Taking expectations in the above display and using the bound (6.4), we obtain

[TABLE]

Hence we take expectations in (6.7), noting that $\operatorname{\mathbb{E}}(M_{n}^{2})=O(n^{-2})$ by Lemma 2.4, to get

[TABLE]

Now suppose $t\in(n^{-\nu},\frac{1}{40(n+1)})$ , and take $\delta>0$ as specified in Proposition 5.1. From Proposition 5.1 (i) with $p=0$ and $q=m+1\leq\delta\log n$ , we have

[TABLE]

since $nt<1$ . On the other hand, $\operatorname{\mathbb{E}}(|Z_{n,t}|^{m+1})\leq Ct^{\frac{m+1}{2}}\operatorname{\mathbb{E}}(|R_{n,t}|^{m+1})$ . For $m-1\in 2{\mathbb{Z}}_{+}$ , we have from Proposition 5.1 (i) with $p=m+1\leq\delta n$ and $q=0$ ,

[TABLE]

Using Stirling’s formula, $\Gamma(x)\sim\sqrt{2\pi x}(x/{\mathrm{e}})^{x}$ , and the fact that $nt<1$ , it follows that

[TABLE]

For $m\in 2{\mathbb{Z}}_{+}$ , using Proposition 5.1 (i) (again) with $p=m+2\leq\delta n$ and $q=0$ , plus Jensen’s inequality, leads to the same conclusion. Thus from (6) we get

[TABLE]

provided $m\leq\delta\log n$ . Taking $m=\lfloor(\delta/3)\log n\rfloor$ and $nt\leq c<1/80$ , we have $(80nt)^{m}\leq(80c)^{(\delta/4)\log n}$ for all $n$ large enough; thus provided $c\leq c_{0}:={\mathrm{e}}^{-2/\delta}/80$ we have that $(80nt)^{m}\leq Cn^{-1/2}$ . Then from (6.10) we conclude (6.2). ∎

With Proposition 6.1 in hand, the remaining task in the proof of Theorem 1.10 is to bound $\operatorname{\mathbb{E}}{\mathcal{E}}^{\ell}_{n,t}$ from (6.1) and hence control the sum on the right-hand side of (6.2). This is the purpose of the next result, which needs the full strength of Proposition 5.1.

Lemma 6.3.

Suppose that $\nu\in(1,\frac{3}{2})$ , and let $\delta>0$ be as in Proposition 6.1.

(i)

There exists $C<\infty$ such that, for all $n\in{\mathbb{N}}$ , all $t\in(n^{-\nu},\frac{1}{40(n+1)})$ , and all $\ell-1\in 2{\mathbb{Z}}_{+}$ with $1\leq\ell\leq(\delta/2)\log n$ ,

[TABLE] 2. (ii)

It holds that $\operatorname{\mathbb{E}}{\mathcal{E}}^{0}_{n,t}=0$ . Moreover, there exists $C<\infty$ such that, for all $n\in{\mathbb{N}}$ , all $t\in(n^{-\nu},\frac{1}{40(n+1)})$ , and all $\ell\in 2{\mathbb{Z}}_{+}$ with $1\leq\ell\leq(\delta/2)\log n$ ,

[TABLE]

Proof.

Suppose that $\nu\in(1,\frac{3}{2})$ , and let $\delta>0$ be as in Proposition 6.1. Take $n\in{\mathbb{N}}$ and $t\in(n^{-\nu},\frac{1}{40(n+1)})$ . Taking expectations in (6.1) and using (2.2), we have

[TABLE]

The idea is to apply Proposition 5.1 with $p=\ell-2k+1$ and $q=k$ , so that $1\leq p+q=\ell-k+1\leq\delta\log n$ for all $n$ large enough. For part (i), suppose that $\ell-1\in 2{\mathbb{Z}}_{+}$ . Then $\ell-2k+1\in 2{\mathbb{Z}}_{+}$ , and so from Proposition 5.1 (i) we get

[TABLE]

where in the bound we used $\sup_{p\in{\mathbb{N}}}(\frac{p}{2}+1)!(\frac{p}{2})!/p!<\infty$ , as follows from (5.11). Here

[TABLE]

and, similarly,

[TABLE]

From here we get (6.11). For part (ii), suppose $\ell\in 2{\mathbb{Z}}_{+}$ is even. Then $\ell-2k+1$ is odd, and so from Proposition 5.1 (ii) we get

[TABLE]

using $\sup_{p\in{\mathbb{N}}}(\frac{p+1}{2})!(\frac{p+1}{2})!/p!<\infty$ , by (5.11). Now

[TABLE]

and from here we obtain (6.12). ∎

Proof of Theorem 1.10.

Let $\delta>0$ be as in Proposition 5.1. Proposition 6.1 shows that, for constants $c_{0}>0$ and $C<\infty$ , for all $n\in{\mathbb{N}}$ and $t>0$ such that $n^{-5/4}\leq nt\leq c_{0}$ , say, the bound (6.2) holds. In particular, taking $c_{1}\in(0,c_{0})$ for which $160c_{1}^{2}/s_{0}<1/2$ , the bound (6.2) holds whenever $t>0$ and $n=n_{t}:=\lfloor c_{1}/t\rfloor$ is sufficiently large. For $\ell\leq 2\lfloor(\delta/3)\log n\rfloor+1$ , we have $\ell\leq(\delta/2)\log n$ for all $n$ large enough, and so Lemma 6.3, with $n=n_{t}$ , shows that

[TABLE]

Hence from (6.2) we get

[TABLE]

by our choice of $n_{t}$ and of $c_{1}$ . ∎

Proof of Theorem 1.3.

Consider the interval $I_{n}:=[-\sqrt{n/\sigma^{2}},\sqrt{n/\sigma^{2}}]$ . We claim that it is a consequence of Theorem 1.10 and the relation $\operatorname{\mathbb{P}}(M_{n}\leq t)=\operatorname{\mathbb{P}}(N_{t}\leq n)$ from (1.15), which holds for all $n\in{\mathbb{N}}$ and all $t\in(0,1)$ , that there exists $C\in{\mathbb{R}}_{+}$ such that

[TABLE]

Assuming (for now) that (6.13) holds, we extend the bound over all $x\in{\mathbb{R}}$ using monotonicity and comparison to the small Gaussian tails. Indeed, we have from (6.13) that

[TABLE]

by standard tail bounds for $\Phi$ , and similarly for the positive tail. Then

[TABLE]

by (6.14). A similar argument applies to the values of $x>\sqrt{n/\sigma^{2}}$ , noting that

[TABLE]

where ${\overline{\Phi}}(x):=1-\Phi(x)$ . Thus Theorem 1.3 follows from (6.13). It remains to verify the claim (6.13), which we deduce from a careful inversion of Theorem 1.10.

For $n\in{\mathbb{N}}$ and $x\in I_{n}$ , define

[TABLE]

Observe that $t_{n}(x)\geq 1/n$ for $x\in I_{n}$ . It follows from (1.15) that

[TABLE]

Also observe that

[TABLE]

the term in brackets in (6.16) is positive for $x\in I_{n}$ , so $y_{n}(x)$ has the same sign as $x$ , and indeed

[TABLE]

We have, by Theorem 1.10, there exists $C\in{\mathbb{R}}_{+}$ such that, for all $n\in{\mathbb{N}}$ and all $x\in{\mathbb{R}}$ ,

[TABLE]

Thus, by (6.15) and the fact that $t_{n}(x)\geq 1/n$ for $x\in I_{n}$ , we get

[TABLE]

It remains to compare $\Phi(y_{n}(x))$ to $\Phi(x)$ . Write (as previously) $\phi$ for the standard Gaussian density. By the mean value theorem, there exists $\theta=\theta_{n}(x)\in(0,1)$ such that

[TABLE]

Here, for $x\in I_{n}$ ,

[TABLE]

by (6.17). Moreover, from (6.16) we get $|y_{n}(x)-x|\leq Cx^{2}/\sqrt{n}$ for all $x\in I_{n}$ and some constant $C\in{\mathbb{R}}_{+}$ . Hence there is a constant $C\in{\mathbb{R}}_{+}$ such that

[TABLE]

Combining (6.18) and (6.20) we verify (6.13). ∎

Finally, we give the proofs of the corollaries from Section 1.2.

Proof of Corollary 1.7.

This is direct from Theorem 1.10 since, as explained in Section 1.2, Kingman’s embedding gives $T_{t}=N_{{\mathrm{e}}^{-t}}$ , $t\in(0,\infty)$ . ∎

Proof of Corollary 1.8.

Recalling the identification $\ell_{n}=\log(1/M_{n})$ , we have, for $x\in{\mathbb{R}}$ ,

[TABLE]

where

[TABLE]

It follows that, for some $C<\infty$ , $|b_{n}(x)-x|\leq Cx^{2}/\sqrt{n}$ , for all $x\in{\mathbb{R}}$ and all $n\in{\mathbb{N}}$ . Hence there exists $\delta>0$ such that $|b_{n}(x)-x|\leq x/2$ for all $x\in[-\delta\sqrt{n},\delta\sqrt{n}]$ , and, in particular, $b_{n}(\delta\sqrt{n})>(\delta/2)\sqrt{n}$ and $b_{n}(-\delta\sqrt{n})<-(\delta/2)\sqrt{n}$ . Consequently, from (6),

[TABLE]

by Theorem 1.3; similarly for $x\leq-\delta\sqrt{n}$ . On the other hand, by (6) and Theorem 1.3, there exists $C<\infty$ such that, for all $n\in{\mathbb{N}}$ and all $x\in[-\delta\sqrt{n},\delta\sqrt{n}]$ ,

[TABLE]

By the mean value theorem, similarly to (6.19), for all $n\in{\mathbb{N}}$ ,

[TABLE]

which completes the proof of (1.4). The proof of (1.5) is direct from the fact that

[TABLE]

and applying Theorem 1.5. ∎

Appendix A Moments of partial sums

We give two auxiliary results needed in the proof of Lemma 5.4. While we suspect that neither result is new, we have not been able to locate a reference. Lemma A.1 gives first-order expansions of moments of sums of i.i.d. summands, with quantitative error bounds, that has some similarities to the Macinkiewicz–Zygmund and Rosenthal inequalities [10, pp. 146–153].

Lemma A.1.

Let $\xi$ be a random variable with $\operatorname{\mathbb{E}}(|\xi|^{k})<\infty$ for every $k\in{\mathbb{Z}}_{+}$ . Consider $\Xi_{n}:=\sum_{i=1}^{n}\xi_{i}$ , where $\xi_{1},\xi_{2},\ldots$ are i.i.d. copies of $\xi$ . Write $\mu_{k}(n):=\operatorname{\mathbb{E}}(\Xi_{n}^{k})$ , and

[TABLE]

(i)

Suppose that $\operatorname{\mathbb{E}}\xi=0$ . Then for all $k\in{\mathbb{Z}}_{+}$ and all $n\in{\mathbb{N}}$ ,

[TABLE] 2. (ii)

Suppose that $\operatorname{\mathbb{E}}\xi>0$ . Then for all $k\in{\mathbb{Z}}_{+}$ and all $n\in{\mathbb{N}}$ ,

[TABLE]

Proof.

First we prove (ii). Similarly to the proof of Lemma 3.3 (but with an index shift), write $I_{n,k}=\{1,\ldots,n\}^{k}$ and $I_{n,k}^{\circ}$ for vectors in $I_{n,k}$ with no two coordinates the same. Then, by the fact that the $\xi_{i}$ are i.i.d.,

[TABLE]

On the other hand, given $(i_{1},\ldots,i_{k})\in I_{n,k}$ for which there are $\ell\in\{1,\ldots,k\}$ distinct indices appearing in multiplicities $m_{1}+\cdots+m_{\ell}=k$ , by independence,

[TABLE]

where we used Lyapunov’s inequality for the middle step. Hence

[TABLE]

Moreover, by (3.11), we have $0\leq n^{k}-|I_{n,k}^{\circ}|\leq k^{2}n^{k-1}$ . It follows that

[TABLE]

where $A_{k}$ is as defined at (A.1). This yields part (ii). For part (i), note that

[TABLE]

since the product has expectation zero whenever $(i_{1},\ldots,i_{2k})$ has a coordinate which appears exactly once, and the combinatorial factor comes from the number of ways of pairing up coordinates. The rightmost expression in the last display is $\operatorname{\mathbb{E}}(\widetilde{\Xi}^{k}_{n})$ for $\widetilde{\Xi}_{n}=\sum_{i=1}^{n}\xi_{i}^{2}$ . Thus part (i) follows from part (ii), with $A_{k}^{\prime}$ in (A.1) being the quantity $A_{k}$ but with $\xi^{2}$ in place of $\xi$ . ∎

Lemma A.2.

Let $X\sim{\mathrm{Bin}(n,1/2)}$ . Then, for every $\alpha\in{\mathbb{Z}}_{+},\beta\in{\mathbb{Z}}_{+}$ with $\max(\alpha,\beta)\geq 1$ ,

[TABLE]

Proof.

Write $Q_{n}(x):=x^{\alpha}(n-x)^{\beta}$ and $Z_{n}:=X-(n/2)$ . If $\alpha\geq 1$ , it follows from the mean value theorem that $|(1+y)^{\alpha}-1|\leq\alpha 2^{\alpha-1}|y|$ , $|y|\leq 1$ . If $\alpha\in[0,1]$ , then $g(y):=((1+y)^{\alpha}-1)/y$ has $g^{\prime}(y)\in(-\infty,0)$ for all $|y|<1$ , and hence $\sup_{y\in[-1,1]}|g(y)|=g(-1)=1$ . Combining the two cases gives, for every $\alpha\geq 0$ ,

[TABLE]

Using (A.2) we observe that, for $0\leq x\leq n$ ,

[TABLE]

Hence, for $0\leq x\leq n$ ,

[TABLE]

Then, using (A),

[TABLE]

By Lyapunov’s inequality, $\operatorname{\mathbb{E}}|Z_{n}|\leq(\operatorname{\mathbb{E}}(Z_{n}^{2}))^{1/2}=(\operatorname{\mathbb{V}ar}X)^{1/2}=\sqrt{n/4}$ , so that

[TABLE]

Considering separately the cases $\min(\alpha,\beta)\geq 1$ and $\min(\alpha,\beta)=0$ (and using $2^{-x}\leq x/2$ for $x\geq 1$ ) it is not hard to verify that $2^{-\alpha-\beta}(c_{\alpha}+c_{\beta})\leq\max(\alpha,\beta)$ as long as $\max(\alpha,\beta)\geq 1$ . ∎

Acknowledgements

AW was supported by EPSRC grant EP/W00657X/1. Part of this work was undertaken during the programme “Stochastic systems for anomalous diffusion” (July–December 2024) hosted by the Isaac Newton Institute, under EPSRC grant EP/Z000580/1.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R.L. Adler and L. Flatto, Uniform distribution of Kakutani’s interval splitting procedure. Z. Wahrsch. Verw. Gebiete 38 (1977) 253–259.
2[2] T.D. Ahle, Sharp and simple bounds for the raw moments of the binomial and Poisson distributions. Statist. Probab. Lett. 182 (2022) 109306.
3[3] P. Bak and K. Sneppen, Punctuated equilibrium and criticality in a simple model of evolution. Phys. Rev. Letters 71 (1993) 4083–4086.
4[4] A. D. Barbour, L. Holst and S. Janson, Poisson Approximation , Oxford University Press, Oxford, 1992.
5[5] J. Bérard and J.-B. Gouéré, Brunet-Derrida behavior of branching-selection particle systems on the line. Commun. Math. Phys. 298 (2010) 323–342.
6[6] P. Bickel, M. Fiocco, M. de Gunst and F. Götze, Willem van Zwet’s research. Ann. Statist. 49 (2021) 2439–2447.
7[7] E. Brunet and B. Derrida, Microscopic models of traveling wave equations. Computer Phys. Commun. 121–122 (1999) 376–381.
8[8] L. Le Cam, An approximation theorem for the Poisson binomial distribution. Pacific J. Math. 10 (1960) 1181–1197.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Rates of convergence for extremal spacings in Kakutani’s

Abstract

1 Kakutani’s interval-splitting process

1.1 Main results

Proposition 1.1** (Pyke & van Zwet, 2004).**

Remark 1.2*.*

Theorem 1.3**.**

Remark 1.4*.*

Theorem 1.5**.**

Remark 1.6*.*

1.2 Related branching, fragmentation, and parking processes

Total population in a binary branching process.

Corollary 1.7**.**

Extremum-driven branching random walk.

Corollary 1.8**.**

A zero-length slack parking model.

1.3 Notation and background

1.4 Inversion and threshold times

Proposition 1.9** (Pyke & van Zwet, 2004).**

Theorem 1.10**.**

Remark 1.11*.*

1.5 Overview of the proofs and some further remarks

Overview of the proofs.

Comparison to the Dirichlet process and uniform spacings.

Other order statistics.

2 Means, variances, and moment bounds

Proposition 2.1** (Lootgieter, 1977; van Zwet, 1978).**

Remark 2.2*.*

Lemma 2.3**.**

Proof.

Lemma 2.4**.**

Proof.

Lemma 2.5**.**

Proof.

Corollary 2.6**.**

Proof.

3 Small-gap statistics

3.1 Conditional independence structure and moments

Lemma 3.1**.**

Proof.

Corollary 3.2**.**

Lemma 3.3**.**

Proof.

3.2 Limit theorem for the smallest gap

Proof of Theorem 1.5.

4 Conditional Berry–Esseen bounds

Lemma 4.1**.**

Proof.

Lemma 4.2**.**

Proof.

Lemma 4.3**.**

Proof.

Remark 4.4*.*

Lemma 4.5**.**

Proof.

Remark 4.6*.*

5 Conditional means, variances, and their moments

Proposition 5.1**.**

Remark 5.2*.*

Lemma 5.3**.**

Proof.

Lemma 5.4**.**

Proof of Lemma 5.4.

Proof of Proposition 5.1.

6 Completing the proofs of the main theorems

Proposition 6.1**.**

Lemma 6.2**.**

Proof.

Proof of Proposition 6.1.

Lemma 6.3**.**

Proof.

Proof of Theorem 1.10.

Proof of Theorem 1.3.

Proposition 1.1 (Pyke & van Zwet, 2004).

*Remark 1.2**.*

Theorem 1.3.

*Remark 1.4**.*

Theorem 1.5.

*Remark 1.6**.*

Corollary 1.7.

Corollary 1.8.

Proposition 1.9 (Pyke & van Zwet, 2004).

Theorem 1.10.

*Remark 1.11**.*

Proposition 2.1 (Lootgieter, 1977; van Zwet, 1978).

*Remark 2.2**.*

Lemma 2.3.

Lemma 2.4.

Lemma 2.5.

Corollary 2.6.

Lemma 3.1.

Corollary 3.2.

Lemma 3.3.

Lemma 4.1.

Lemma 4.2.

Lemma 4.3.

*Remark 4.4**.*

Lemma 4.5.

*Remark 4.6**.*

Proposition 5.1.

*Remark 5.2**.*

Lemma 5.3.

Lemma 5.4.

Proposition 6.1.

Lemma 6.2.

Lemma 6.3.

Lemma A.1.

Lemma A.2.