A large-deviations principle for all the cluster sizes of a sparse   Erd\H{o}s-R\'enyi graph

Luisa Andreis; Wolfgang K\"onig; Robert I. A. Patterson

arXiv:1901.01876·math.PR·April 26, 2021

A large-deviations principle for all the cluster sizes of a sparse Erd\H{o}s-R\'enyi graph

Luisa Andreis, Wolfgang K\"onig, Robert I. A. Patterson

PDF

TL;DR

This paper establishes a large-deviations principle for the distribution of all component sizes in a sparse Erdős-Rényi graph, capturing phase transition phenomena and linking to coagulation models.

Contribution

It provides an explicit rate function describing microscopic, mesoscopic, and macroscopic component sizes, including the phase transition at t=1.

Findings

01

Explicit large-deviations rate function for component sizes.

02

Captures phase transition at t=1.

03

Links to coagulation models and gelation phenomena.

Abstract

Let $G (N, \frac{1}{N} t_{N})$ be the Erd\H{o}s-R\'enyi graph with connection probability $\frac{1}{N} t_{N} \sim t / N$ as $N \to \infty$ for a fixed $t \in (0, \infty)$ . We derive a large-deviations principle for the empirical measure of the sizes of all the connected components of $G (N, \frac{1}{N} t_{N})$ , registered according to microscopic sizes (i.e., of finite order), macroscopic ones (i.e., of order $N$ ), and mesoscopic ones (everything in between). The rate function explicitly describes the microscopic and macroscopic components and the fraction of vertices in components of mesoscopic sizes. Moreover, it clearly captures the well known phase transition at $t = 1$ as part of a comprehensive picture. The proofs rely on elementary combinatorics and on known estimates and asymptotics for the probability that subgraphs are connected. We also draw conclusions for the strongly related model of…

Equations275

S_{1}^{(N)} \geq S_{2}^{(N)} \geq \dots \geq S_{n}^{(N)} \geq 1, i = 1 \sum n S_{i}^{(N)} = N,

S_{1}^{(N)} \geq S_{2}^{(N)} \geq \dots \geq S_{n}^{(N)} \geq 1, i = 1 \sum n S_{i}^{(N)} = N,

Mi^{(N)} = \frac{1}{N} i = 1 \sum n δ_{S_{i}^{(N)}} \mbox an d Ma^{(N)} = i = 1 \sum n δ_{\frac{1}{N} S_{i}^{(N)}} .

Mi^{(N)} = \frac{1}{N} i = 1 \sum n δ_{S_{i}^{(N)}} \mbox an d Ma^{(N)} = i = 1 \sum n δ_{\frac{1}{N} S_{i}^{(N)}} .

{\mathcal{N}}(c)=\Big{\{}\Lambda=(\lambda_{k})_{k\in\mathbb{N}}\in[0,\infty)^{\mathbb{N}}\colon\sum_{k\in\mathbb{N}}k\lambda_{k}=c\Big{\}},\qquad c>0.

{\mathcal{N}}(c)=\Big{\{}\Lambda=(\lambda_{k})_{k\in\mathbb{N}}\in[0,\infty)^{\mathbb{N}}\colon\sum_{k\in\mathbb{N}}k\lambda_{k}=c\Big{\}},\qquad c>0.

{\mathcal{M}}_{\mathbb{N}_{0}}((0,1];c)=\Big{\{}\alpha\in{\mathcal{M}}_{\mathbb{N}_{0}}((0,1])\colon\int_{(0,1]}x\,\alpha({\rm d}x)=c\Big{\}},

{\mathcal{M}}_{\mathbb{N}_{0}}((0,1];c)=\Big{\{}\alpha\in{\mathcal{M}}_{\mathbb{N}_{0}}((0,1])\colon\int_{(0,1]}x\,\alpha({\rm d}x)=c\Big{\}},

Λ \mapsto c_{Λ} := k \in N \sum k λ_{k} and α \mapsto c_{α} := \int_{(0, 1]} x α (d x) .

Λ \mapsto c_{Λ} := k \in N \sum k λ_{k} and α \mapsto c_{α} := \int_{(0, 1]} x α (d x) .

(\Lambda,\alpha)\mapsto I(\Lambda,\alpha;t)=\begin{cases}I_{\rm Mi}(\Lambda;t)+I_{\rm Ma}(\alpha;t)+(1-c_{\Lambda}-c_{\alpha})\Big{(}\frac{t}{2}-\log t\Big{)},&\mbox{if }c_{\Lambda}+c_{\alpha}\leq 1,\\ \infty&\mbox{otherwise,}\end{cases}

(\Lambda,\alpha)\mapsto I(\Lambda,\alpha;t)=\begin{cases}I_{\rm Mi}(\Lambda;t)+I_{\rm Ma}(\alpha;t)+(1-c_{\Lambda}-c_{\alpha})\Big{(}\frac{t}{2}-\log t\Big{)},&\mbox{if }c_{\Lambda}+c_{\alpha}\leq 1,\\ \infty&\mbox{otherwise,}\end{cases}

I_{Mi} (Λ; t)

I_{Mi} (Λ; t)

I_{Ma} (α; t)

N \to \infty lim inf \frac{1}{N} lo g P_{N} ((Mi^{(N)}, Ma^{(N)}) \in G)

N \to \infty lim inf \frac{1}{N} lo g P_{N} ((Mi^{(N)}, Ma^{(N)}) \in G)

N \to \infty lim sup \frac{1}{N} lo g P_{N} ((Mi^{(N)}, Ma^{(N)}) \in F)

I_{Mi} (Λ; t) = α \in M in f I (Λ, α; t) = I_{Mi} (Λ; t) - (1 - c_{Λ}) (lo g \frac{1 - e ^{(c_{Λ} - 1) t}}{1 - c _{Λ}} - \frac{c _{Λ} t}{2}) .

I_{Mi} (Λ; t) = α \in M in f I (Λ, α; t) = I_{Mi} (Λ; t) - (1 - c_{Λ}) (lo g \frac{1 - e ^{(c_{Λ} - 1) t}}{1 - c _{Λ}} - \frac{c _{Λ} t}{2}) .

I_{Ma} (α; t)

I_{Ma} (α; t)

\displaystyle=I_{\rm Ma}(\alpha;t)+(1-c_{\alpha})\Big{(}\frac{t}{2}-\log t\Big{)}+C_{\alpha,t}\Big{(}\log(tC_{\alpha,t})-\frac{t}{2}C_{\alpha,t}\Big{)},

\overline{Me}_{R, ε}^{(N)} = \frac{1}{N} i : R < S_{i}^{(N)} < εN \sum S_{i}^{(N)} .

\overline{Me}_{R, ε}^{(N)} = \frac{1}{N} i : R < S_{i}^{(N)} < εN \sum S_{i}^{(N)} .

{\mathcal{J}}_{\rm Me}^{{{\scriptscriptstyle{({R,\varepsilon}})}}}(c;t)=\inf\Big{\{}I(\Lambda,\alpha;t)\colon\sum_{k=1}^{R}k\lambda_{k}+\int_{\varepsilon}^{1}x\,\alpha({\rm d}x)=1-c\Big{\}}.

{\mathcal{J}}_{\rm Me}^{{{\scriptscriptstyle{({R,\varepsilon}})}}}(c;t)=\inf\Big{\{}I(\Lambda,\alpha;t)\colon\sum_{k=1}^{R}k\lambda_{k}+\int_{\varepsilon}^{1}x\,\alpha({\rm d}x)=1-c\Big{\}}.

J_{Me} (c; t)

J_{Me} (c; t)

\displaystyle=(1-c)\Big{(}\log(1-c)t-\frac{(1-c)t}{2}\Big{)}+\frac{t}{2}-\log t.

J_{Mi} (c; t) = Λ \in N (c) in f I_{Mi} (Λ; t) \mbox an d J_{Ma} (c; t) = α \in M_{N_{0}} ((0, 1]; c) in f I_{Ma} (α; t),

J_{Mi} (c; t) = Λ \in N (c) in f I_{Mi} (Λ; t) \mbox an d J_{Ma} (c; t) = α \in M_{N_{0}} ((0, 1]; c) in f I_{Ma} (α; t),

J_{Mi} (c; t)

J_{Mi} (c; t)

λ_{k}^{*} (c; t) = \frac{k ^{k - 2} c ^{k} t ^{k - 1} e ^{- c t k}}{k !}, k \in N,

λ_{k}^{*} (c; t) = \frac{k ^{k - 2} c ^{k} t ^{k - 1} e ^{- c t k}}{k !}, k \in N,

(Λ, α) \in N \times M in f I (Λ, α; t)

(Λ, α) \in N \times M in f I (Λ, α; t)

lo g β_{t} = t β_{t} - t .

lo g β_{t} = t β_{t} - t .

k λ_{k}^{*} (c; t) = c Bo_{c t} (k),

k λ_{k}^{*} (c; t) = c Bo_{c t} (k),

\big{(}{\rm Mi}^{{{\scriptscriptstyle{({N}})}}},{\rm Ma}^{{{\scriptscriptstyle{({N}})}}}\big{)}\overset{N\to\infty}{\Longrightarrow}\begin{cases}(\Lambda^{*}(1;t),\mathbf{0})&\mbox{if }t\leq 1,\\ (\Lambda^{*}(\beta_{t};t),(1-\beta_{t},0,\dots))&\mbox{if }t\geq 1.\end{cases}

\big{(}{\rm Mi}^{{{\scriptscriptstyle{({N}})}}},{\rm Ma}^{{{\scriptscriptstyle{({N}})}}}\big{)}\overset{N\to\infty}{\Longrightarrow}\begin{cases}(\Lambda^{*}(1;t),\mathbf{0})&\mbox{if }t\leq 1,\\ (\Lambda^{*}(\beta_{t};t),(1-\beta_{t},0,\dots))&\mbox{if }t\geq 1.\end{cases}

\frac{d}{d t} l_{k} (t) = \frac{1}{2} m, m : m + m = k \sum l_{m} (t) l_{m} (t) K (m, m) - l_{k} (t) m \sum l_{m} (t) K (k, m), k \in N,

\frac{d}{d t} l_{k} (t) = \frac{1}{2} m, m : m + m = k \sum l_{m} (t) l_{m} (t) K (m, m) - l_{k} (t) m \sum l_{m} (t) K (k, m), k \in N,

Z_{Λ_{N}}^{(β)} = (ℓ_{k})_{k \in N} \in N_{0}^{N} : \sum_{k} k ℓ_{k} = N \sum k \prod \frac{N ^{ℓ_{k}}}{ℓ _{k} ! k ^{ℓ_{k}}} [ρ (4 π β k)^{\frac{d}{2}}]^{- ℓ_{k}},

Z_{Λ_{N}}^{(β)} = (ℓ_{k})_{k \in N} \in N_{0}^{N} : \sum_{k} k ℓ_{k} = N \sum k \prod \frac{N ^{ℓ_{k}}}{ℓ _{k} ! k ^{ℓ_{k}}} [ρ (4 π β k)^{\frac{d}{2}}]^{- ℓ_{k}},

f (β, ρ) = N \to \infty lim \frac{1}{N} lo g Z_{Λ_{N}}^{(β)} = - Λ \in N (ρ) in f I (Λ), \mbox w h er e I (Λ) = k \sum λ_{k} lo g \frac{λ _{k} k}{( 4 π β k ) ^{\frac{d}{2}} e} .

f (β, ρ) = N \to \infty lim \frac{1}{N} lo g Z_{Λ_{N}}^{(β)} = - Λ \in N (ρ) in f I (Λ), \mbox w h er e I (Λ) = k \sum λ_{k} lo g \frac{λ _{k} k}{( 4 π β k ) ^{\frac{d}{2}} e} .

k λ_{k}^{(ML)} (c; t) = \frac{1}{t} \frac{( c t e ^{- c t} ) ^{k}}{k ^{1 - k} k !} \sim \frac{1}{2 π t} \frac{( c t e ^{- c t + 1} ) ^{k}}{k ^{3/2}} and k λ_{k}^{(BEC)} (α; β) = \frac{1}{ρ ( 4 π β ) ^{\frac{d}{2}}} \frac{e ^{- α k}}{k ^{\frac{d}{2}}},

k λ_{k}^{(ML)} (c; t) = \frac{1}{t} \frac{( c t e ^{- c t} ) ^{k}}{k ^{1 - k} k !} \sim \frac{1}{2 π t} \frac{( c t e ^{- c t + 1} ) ^{k}}{k ^{3/2}} and k λ_{k}^{(BEC)} (α; β) = \frac{1}{ρ ( 4 π β ) ^{\frac{d}{2}}} \frac{e ^{- α k}}{k ^{\frac{d}{2}}},

\mu_{k}(p)=\mathbb{P}_{k,p}\big{(}{\mathcal{G}}\text{ is connected}\big{)},\qquad k\in\mathbb{N},p\in[0,1].

\mu_{k}(p)=\mathbb{P}_{k,p}\big{(}{\mathcal{G}}\text{ is connected}\big{)},\qquad k\in\mathbb{N},p\in[0,1].

E_{N}=\Big{\{}(s_{i})_{i\in\{1,\dots,n\}}\in\mathbb{N}_{0}^{n}\colon n\in\mathbb{N},s_{1}\geq s_{2}\geq\dots\geq 0,\,\sum_{i=1}^{n}s_{i}=N\Big{\}}.

E_{N}=\Big{\{}(s_{i})_{i\in\{1,\dots,n\}}\in\mathbb{N}_{0}^{n}\colon n\in\mathbb{N},s_{1}\geq s_{2}\geq\dots\geq 0,\,\sum_{i=1}^{n}s_{i}=N\Big{\}}.

{\mathcal{N}}_{N}=\Big{\{}\ell=(\ell_{k})_{k}\in\mathbb{N}_{0}^{\mathbb{N}}\colon\sum_{k}k\ell_{k}=N\Big{\}},

{\mathcal{N}}_{N}=\Big{\{}\ell=(\ell_{k})_{k}\in\mathbb{N}_{0}^{\mathbb{N}}\colon\sum_{k}k\ell_{k}=N\Big{\}},

\mathbb{P}_{N,p}\big{(}(S_{i}^{{{\scriptscriptstyle{({N}})}}})_{i}=(s_{i})_{i}\big{)}=\#\{\pi\in{\mathcal{P}}_{N}\colon B_{k}(\pi)=\ell_{k}\,\forall k\}\times\Big{(}\prod_{i}\mu_{s_{i}}(p)\Big{)}\times\Big{(}\prod_{i\not=j}(1-p)^{\frac{1}{2}m_{i}\,m_{j}}\Big{)}.

\mathbb{P}_{N,p}\big{(}(S_{i}^{{{\scriptscriptstyle{({N}})}}})_{i}=(s_{i})_{i}\big{)}=\#\{\pi\in{\mathcal{P}}_{N}\colon B_{k}(\pi)=\ell_{k}\,\forall k\}\times\Big{(}\prod_{i}\mu_{s_{i}}(p)\Big{)}\times\Big{(}\prod_{i\not=j}(1-p)^{\frac{1}{2}m_{i}\,m_{j}}\Big{)}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A large-deviations principle

for all the cluster sizes

of a sparse Erdős-Rényi graph

Luisa Andreis

WIAS, Mohrenstraße 39, 10117 Berlin

[email protected]

,

Wolfgang König

TU Berlin and WIAS, Mohrenstraße 39, 10117 Berlin

[email protected]

and

Robert I. A. Patterson

WIAS, Mohrenstraße 39, 10117 Berlin

[email protected]

Abstract.

Let ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ be the Erdős-Rényi graph with connection probability $\frac{1}{N}t_{N}\sim t/N$ as $N\to\infty$ for a fixed $t\in(0,\infty)$ . We derive a large-deviations principle for the empirical measure of the sizes of all the connected components of ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ , registered according to microscopic sizes (i.e., of finite order), macroscopic ones (i.e., of order $N$ ), and mesoscopic ones (everything in between). The rate function explicitly describes the microscopic and macroscopic components and the fraction of vertices in components of mesoscopic sizes. Moreover, it clearly captures the well known phase transition at $t=1$ as part of a comprehensive picture. The proofs rely on elementary combinatorics and on known estimates and asymptotics for the probability that subgraphs are connected. We also draw conclusions for the strongly related model of the multiplicative coalescent, the Marcus–Lushnikov coagulation model with monodisperse initial condition, and its gelation phase transition.

(6 January, 2021)

MSC 2020: 05C80, 60F10, 60K35, 82B26.

Keywords and phrases. Erdős-Rényi random graph, component sizes, large deviations, empirical measure, phase transition, sizes, multiplicative coalescent, gelation.

1. Introduction

In this paper, we study the Erdős-Rényi random graph ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ , that is, the random graph on the vertex set $[N]=\{1,\dots,N\}$ , where each two distinct vertices are independently connected with probability $\frac{1}{N}t_{N}$ . We will be working in the sparse regime, i.e., we assume that $\lim_{N\to\infty}t_{N}=t$ for some fixed $t\in(0,\infty)$ . This is the regime in which the famous phase transition of the emergence of a giant cluster at $t=1$ occurs, which was detected and characterised for the first time in the seminal paper [ER60]. For an extensive overview on the model see the classical reference [Bol01].

Our new contribution in this paper is a comprehensive study of the family of the sizes of all the connected components of ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ , registered according to the asymptotic order of the size in the limit as $N\to\infty$ . We distinguish here microscopic components (i.e., with size of order one), macroscopic components (i.e., size of order $N$ , usually referred to as giant components) and mesoscopic ones (everything in between). We summarize all this information in terms of two empirical measures and derive a large-deviation principle (LDP) for them. Our rate function is rather explicit.

Such a principle gives information about the exponential decay rate of all sorts of events, e.g, the emergence of more than one giant cluster or the presence of a non-trivial proportion of vertices in mesoscopic components. Moreover, the minimizers of the rate function represent the most likely configurations of the graph, which is expressed in terms of a law of large numbers for the objects that satisfy the LDP. In this way, we recover the mentioned phase transition and collect detailed information about the statistics of the sizes of all the components, both in the subcritical regime (where no giant component occurs) and the supercritical one.

Many investigations of the Erdős-Rényi graph and other random graphs in the sparse regime rely on approximations of subgraphs with certain Galton–Watson trees and other branching processes. We would like to stress that our approach does not use such arguments and is therefore an alternate ansatz.

In Section 1.1, we introduce our approach, in Section 1.2, we formulate our main results about large deviations and in Section 1.3 their consequences for the phase transition, and in Section 1.4 we give a literature survey.

Our original interest in this study was triggered by a desire to understand random particle processes with coagulation, in particular its simplest variant, the Marcus–Lushnikov model with multiplicative coagulation kernel. We introduce this process and its connections with our work on the LDP for the Erdős-Rényi graph in Section 1.5.

Another highly interesting connection appears with a LDP-proof for the well-known Bose–Einstein condensation phase transition that appears in the free (i.e., non-interacting) Bose gas; we will explain the similarities and the differences in Section 1.6.

1.1. Micro- and macroscopic empirical measures

Let us introduce the main objects that we study in this paper. For the remainder of the paper, we fix $t\in(0,\infty)$ and will be working with the graph ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ , where $t_{N}=t+o(1)$ as $N\to\infty$ . By

[TABLE]

we denote the sizes of all the connected components of ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ , ordered in a decreasing way ( $n\in\{1,\dots,N\}$ is the number of components). We want to describe the entire family $(S_{i}^{{{\scriptscriptstyle{({N}})}}})_{i\in\{1,\dots,n\}}$ in the limit $N\to\infty$ . This is a comprehensive object, which contains several scales. In oder to adequately describe the most important two scales, it will be convenient to work with two empirical measures of the component sizes in the microscopic and macroscopic size ranges:

[TABLE]

Intuitively, while ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}$ registers the proportion of components of ‘microscopic’ sizes $1,2,3,\dots$ on the scale $N$ , ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ registers the components of ‘macroscopic’ sizes, i.e. order $N$ . Note that each of the two measures admits a one-to-one map onto the vector $(S_{i}^{{{\scriptscriptstyle{({N}})}}})_{i\in\{1,\dots,n\}}$ for fixed $N\in\mathbb{N}$ and therefore contains all the information contained in the vector. However, in the limit $N\to\infty$ , they will be able to describe only the statistics of the microscopic, respectively macroscopic, part of the particle configuration. We would like to stress here that this issue lies at the heart of the phase transition of the emergence of a giant component, i.e., a macroscopic size.

Here is a non-technical, intuitive explanation: in the limit as $N\to\infty$ , all sizes $S_{i}^{{{\scriptscriptstyle{({N}})}}}$ that somehow diverge, will vanish from the support of ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}$ ‘at infinity’, and all sizes $S_{i}^{{{\scriptscriptstyle{({N}})}}}$ that are $\ll N$ will vanish from the support of ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ ‘at zero’. Hence, ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}$ may leak out mass at infinity, and ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ at zero. It is by no means automatic that all the mass that leaks out from the microscopic part at infinity enters the macroscopic part at zero. In order to control that, we also need to take care of the mesoscopic mass, coming from particle masses $1\ll S_{i}^{{{\scriptscriptstyle{({N}})}}}\ll N$ . Since here a lot of scales are contained (indeed, a continuum of scales), we will not be able to say anything about these sizes, but only about the total proportion of vertices belonging to such components.

The famous phase transition (proved first in [ER60]) says that, for $t\leq 1$ , there is no loss of mass from ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}$ (i.e., the first moment stays equal to one in the limit), and ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ convergesd to the zero measure (i.e., loses all its mass), while for $t>1$ , the total mass of ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}$ loses a positive amount equal to that retained by ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ in the limit, and this results in a single Dirac measure. In both cases, the mesoscopic part vanishes, even though mesoscopic components are present in the graph with high probability, but their proportion is negligible.

These are assertions of the type of laws of large numbers. However, in the setting of a large-deviation principle as we are working here, we will obtain significant results also about the probabilities of several very unlikely events, like the emergence of non-trivial mesoscopic total mass, of more than one giant cluster and of different statistics of microscopic component sizes. Note that we decided to work on probabilities that are on an exponential scale $N$ and for connection probabilities of the form $\frac{1}{N}(t+o(1))$ for general $t\in(0,\infty)$ . This excludes for example all (highly interesting) phenomena that occur with respect to cluster sizes of order $N^{2/3}$ when considering more specified connection probabilities of the size $\frac{1}{N}(1+cN^{-1/3})$ ; see [Ald97].

Now let us give a more technical explanation of the issue about possible losses of masses, which will also set the frame for the mathematical treatment. We will conceive the discrete measure ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}=({\rm Mi}^{{{\scriptscriptstyle{({N}})}}}_{k})_{k\in\mathbb{N}}$ as a random element of the sequence set ${\mathcal{N}}=\bigcup_{c\in[0,1]}{\mathcal{N}}(c)$ , where

[TABLE]

We equip ${\mathcal{N}}=\{\Lambda\colon\sum_{k}k\lambda_{k}\leq 1\}$ with the topology of coordinate-wise convergence, which makes it compact by the Bolzano–Weierstrass theorem combined with Fatou’s lemma.

The point measure ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ is a random element of the set ${\mathcal{M}}:=\bigcup_{c\in[0,1]}{\mathcal{M}}_{\mathbb{N}_{0}}((0,1];c)$ , where

[TABLE]

and ${\mathcal{M}}_{\mathbb{N}_{0}}((0,1])$ is the set of all measures on $(0,1]$ with values in $\mathbb{N}_{0}=\left\{0\right\}\cup\mathbb{N}$ . We equip ${\mathcal{M}}$ with the topology that is induced by functionals of the form $\mu\mapsto\int_{(0,1]}f(x)\,\mu({\rm d}x)$ where $f\colon(0,1]\to\mathbb{R}$ is continuous and compactly supported. We sometimes write the elements of ${\mathcal{M}}$ as $\alpha=\sum_{j}\delta_{\alpha_{j}}$ with $1\geq\alpha_{1}\geq\alpha_{2}\geq\dots>0$ and $\sum_{j}\alpha_{j}\leq 1$ , where $j$ extends over a finite subset of $\mathbb{N}$ or over $\mathbb{N}$ . Then convergence is equivalent with the pointwise convergence of each of the atoms. By similar arguments as for ${\mathcal{N}}$ , also ${\mathcal{M}}$ is compact. We equip the product of ${\mathcal{N}}$ and ${\mathcal{M}}$ with the product topology, so that it is also compact.

Important quantities are the expectations of the sub-probability distributions $\Lambda\in{\mathcal{N}}$ respectively $\alpha\in{\mathcal{M}}$ , i.e., the maps

[TABLE]

Note that they are not continuous in the respective topologies, but only lower semicontinuous, according to Fatou’s lemma. Indeed, even though the microscopic and macroscopic expectations $c_{{\rm Mi}^{{{\scriptscriptstyle{({N}})}}}}=\sum_{k}k{\rm Mi}^{{{\scriptscriptstyle{({N}})}}}_{k}$ and $c_{{\rm Ma}^{{{\scriptscriptstyle{({N}})}}}}=\int_{(0,1]}x\,{\rm Ma}^{{{\scriptscriptstyle{({N}})}}}({\rm d}x)$ are each equal to one for any $N$ , they may (and will) lose mass in the limit $N\to\infty$ . We sometimes call $c_{\Lambda}$ and $c_{\alpha}$ the total masses of the microscopic, respectively macroscopic, configuration $\Lambda$ and $\alpha$ , since they stand for the total number of particles, after scaling.

The mathematical treatment of the mesoscopic part of the component sizes is more technical, as it requires the introduction of two cutting parameters $R\in\mathbb{N}$ and $\varepsilon\in(0,1)$ . Indeed, a size $S_{i}^{{{\scriptscriptstyle{({N}})}}}$ is called $(R,\varepsilon)$ -mesoscopic if $R<S_{i}^{{{\scriptscriptstyle{({N}})}}}<\varepsilon N$ , and the definition of mesoscopic sizes requires making the limit $N\to\infty$ , followed by $R\to\infty$ and $\varepsilon\downarrow 0$ . There are several scales (indeed, a continuum of scales) contained in this part and in this regime it does not seem reasonable to consider an empirical measure for this part; therefore we will consider only the total proportion of mesoscopic vertices.

Let us remark that our choice of considering exclusively the size of each component, disregarding its bond structure, comes from the interest in coagulation processes, where only the sizes matter; see Section 1.5. An extension of our work to empirical measures of the components seems to require only moderate additional work, at least as it concerns the microscopic part. See Section 1.4 for earlier LDP-investigation of the components as subgraphs.

1.2. Our results: large-deviations principles

In this section, we present all our results on the LDP satisfied by the empirical measure of statistics of component sizes of the Erdős–Rényi graph ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ , the random graph on $[N]=\{1,\dots,N\}$ with connection probability $\frac{1}{N}t_{N}$ , and we assume that $t_{N}=t+o(1)$ with fixed $t\in(0,\infty)$ . In Section 1.3 we will draw conclusions about the phase transition from that. Our main result is the following description of the two empirical measures ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}$ and ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ in terms of a joint large-deviations principle (LDP).

Theorem 1.1 (LDP for the empirical measures).

As $N\to\infty$ , the pair $({\rm Mi}^{{{\scriptscriptstyle{({N}})}}},{\rm Ma}^{{{\scriptscriptstyle{({N}})}}})$ satisfies a large-deviations principle with speed $N$ and rate function

[TABLE]

where we write $\Lambda=(\lambda_{k})_{k\in\mathbb{N}}$ and

[TABLE]

The proof of this theorem is in Section 3; it is based on an explicit combinatorial formula for the joint distribution of all the component sizes, followed by analysis of the arising exponential rates. We organised the three terms of $I$ in the way in which they were derived from the influences of the three parts (micro, macro and meso) in the course of the proof, even though this leads to a cancellation of terms involving $c_{\Lambda}$ and $c_{\alpha}$ . This implies also that separate conclusions about the microscopic and the macroscopic parts can conveniently be made (see Corollaries 1.2 and 1.3). Informally, in (1.5) the terms involving $\lambda_{k}$ , ${\operatorname{e}}$ and $k!$ in the logarithm derive from the combinatorial number of possibilities to decompose $[N]$ into the requested configuration of subsets, the term $k^{k-2}$ and the $t$ in the logarithm stem from the probability that these subsets are connected, and all the other terms from the probability that any of these subsets is not connected with the remainder. This interpretation is not immediate, since a number of asymptotic manipulations have been made during the proof. Similar remarks apply to (1.6). Interestingly, on the right-hand side of (1.5) we see, up to normalization, a relative entropy of $(k\lambda_{k})_{k\in\mathbb{N}}$ with respect to the Borel distribution ${\rm Bo}_{\mu}(k)={\operatorname{e}}^{-\mu k}(\mu k)^{k-1}/k!$ for a particular choice of $\mu$ ; a fact that will be crucial in the analysis of minimizers of the rate function, see the proofs of Corollaries 1.2 and 1.3 and of Theorem 1.5.

Let us recall the notion of an LDP: Theorem 1.1 says that, for any open set $G\subset{\mathcal{N}}\times{\mathcal{M}}$ respectively closed set $F\subset{\mathcal{N}}\times{\mathcal{M}}$ ,

[TABLE]

where we wrote $\mathbb{P}_{N}$ for the probability measure for ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ . For a comprehensive presentation of the theory of large-deviations, see e.g. [DZ10]. It is not difficult to see that since the rate function $I(\cdot,\cdot;t)$ is lower semicontinuous and ${\mathcal{N}}\times{\mathcal{M}}$ is compact, it is even a good rate function, i.e., its level sets $\{(\Lambda,\alpha)\colon I(\Lambda,\alpha;t)\leq r\}$ are compact for any $r$ .

It is well-known in the theory of large deviations (and easy to deduce from the LDP) that for many interesting sets $A\subset{\mathcal{N}}\times{\mathcal{M}}$ one also has that $\mathbb{P}_{N}(({\rm Mi}^{{{\scriptscriptstyle{({N}})}}},{\rm Ma}^{{{\scriptscriptstyle{({N}})}}})\in A)={\operatorname{e}}^{-N\inf_{A}I(\cdot;t)(1+o(1))}$ , for example for sets $A$ that are equal to the closure of their open kernel. There are choices of such sets that give the precise exponential rates of interesting events, for instance the event that there are a given number of components larger than $Na$ , for some $a>0$ , or that a given component size appears with a certain least density, or that a given positive percentage of vertices are contained in components of a given range of sizes (e.g., in $\{1,\dots,R\}$ or in $\{R,\dots,\varepsilon N\}$ or in $\{\varepsilon N,\dots,N\}$ ), and certainly all kinds of combinations of such events.

From our main result, the LDP in Theorem 1.1, a number of other LDPs follow via the contraction principle, according which if a random variable satisfies an LDP, so does its image under a continuous transformation; see [DZ10]. Let us begin with the component size distribution of the microscopic part.

Corollary 1.2 (LDP for microscopic component size statistics).

As $N\to\infty$ , ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}$ satisfies an LDP with speed $N$ and rate function ${\mathcal{I}}_{\rm Mi}(\cdot;t)\colon{\mathcal{N}}\to[0,\infty]$ , given by

[TABLE]

The first equality comes from the application of the contraction principle; while the second equality is purely analytical and it is checked in Lemma 4.1. There it is seen that, given any $\Lambda\in{\mathcal{N}}$ , it is always optimal to have all the remaining mass $1-c_{\Lambda}$ in one single macroscopic component.

In the same way one can investigate the macroscopic part of the system.

Corollary 1.3 (LDP for macroscopic particles).

As $N\to\infty$ , ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ satisfies an LDP with speed $N$ and rate function ${\mathcal{I}}_{\rm Ma}(\cdot;t)\colon{\mathcal{M}}\to[0,\infty]$ , given by

[TABLE]

where $C_{\alpha,t}=(1-c_{\alpha})\wedge\frac{1}{t}$ (recall $c_{\alpha}=\int_{0}^{1}x\,\alpha({\rm d}x)$ ).

Again, only the second equality has to be checked; this is done in Lemma 4.2. In contrast with the result above, here the optimal configuration $\Lambda^{*}$ depends on $\alpha\in{\mathcal{M}}$ , most heavily it depends on whether $1-c_{\alpha}\leq\frac{1}{t}$ or not. Indeed, if $1-c_{\alpha}\leq\frac{1}{t}$ , then $c_{\Lambda^{*}}=1-c_{\alpha}$ (and no mesoscopic part arises). However, if $1-c_{\alpha}>\frac{1}{t}$ , then $c_{\Lambda^{*}}=\frac{1}{t}$ , and a non-trivial mesoscopic mass arises in the minimization; see Theorem 1.5. This peculiarity shows already a key difference between the cases $t\leq 1$ and $t>1$ . Indeed, if $t\leq 1$ one cannot have any macroscopic mass distribution $\alpha$ such that $1-c_{\alpha}>\frac{1}{t}$ and no difference in the minimizing strategy of the system can be seen. This is a first way to see the phase transition at $t=1$ from analytic properties of the rate function.

Now we come to the mesoscopic part of the particle configuration. This part comprises particle sizes on all the scales between finite and $O(N)$ and it seems unreasonable to consider an empirical measure for it. Instead, we consider only the total mass of this mesoscopic part. Let $\varepsilon>0$ and $R\in\mathbb{N}$ be two auxiliary parameters, then we define the $(R,\varepsilon)$ -mesoscopic total mass as

[TABLE]

This is the number of vertices that are contained in components with a size between $R$ and $\varepsilon N$ . The mesoscopic total mass in a strict sense arises after taking the limits $N\to\infty$ , followed by $\varepsilon\downarrow 0$ and $R\to\infty$ , but this does not define a proper random variable. However, it is possible to formulate an LDP in the $N\rightarrow\infty$ limit and then to study the rate function, ${\mathcal{J}}_{\rm Me}^{{{\scriptscriptstyle{({R,\varepsilon}})}}}$ , as $\varepsilon\downarrow 0$ and $R\to\infty$ . Additionally, the proof of Theorem 1.1 shows that it is possible to define a coupled mesoscopic total mass $\overline{\rm Me}^{{{\scriptscriptstyle{({N}})}}}_{R_{N},\varepsilon_{N}}$ , for any diverging sequence $R_{N}$ and vanishing sequence $\varepsilon_{N}$ . This is a well-defined random variable, it satisfies an LDP with speed $N$ and the rate function is the limit of ${\mathcal{J}}_{\rm Me}^{{{\scriptscriptstyle{({R,\varepsilon}})}}}$ when $\epsilon\searrow 0$ and $R\nearrow\infty$ .

Corollary 1.4 (LDP for mesoscopic mass).

(1)

For any $R\in\mathbb{N}$ and $\varepsilon\in(0,1)$ , as $N\to\infty$ , $\overline{\rm Me}^{{{\scriptscriptstyle{({N}})}}}_{R,\varepsilon}$ satisfies an LDP with speed $N$ and rate function $c\mapsto{\mathcal{J}}_{\rm Me}^{{{\scriptscriptstyle{({R,\varepsilon}})}}}(c;t)$ , where

[TABLE] 2. (2)

For any $R_{N}\in\mathbb{N}$ and $\varepsilon_{N}\in(0,1)$ such that $1\ll R_{N}<\varepsilon_{N}N\ll N$ , and $|\frac{1}{\varepsilon_{N}}\log\varepsilon_{N}|\leq o(N)$ , the coupled mesoscopic total mass $\overline{\rm Me}^{{{\scriptscriptstyle{({N}})}}}_{R_{N},\varepsilon_{N}}$ satisfies an LDP with speed $N$ and rate function

[TABLE]

The function ${\mathcal{J}}_{\rm Me}(c;t)$ is strictly increasing in $c$ , its minimum over $[0,1]$ is ${\mathcal{J}}_{\rm Me}(0;t)=0$ .

Corollary 1.4 part (1) is a simple consequence of the contraction principle, as the maps $\Lambda\mapsto\sum_{k=1}^{R}k\lambda_{k}$ and $\alpha\mapsto\int_{\varepsilon}^{1}x\,\alpha({\rm d}x)$ are continuous. Assertion (2) follows as a byproduct of our proof of Theorem 1.1 in Section 3.

Hence, ${\mathcal{J}}_{\rm Me}(\cdot;t)$ can rightfully be called the rate function for the mesoscopic total mass. Since it is positive everywhere outside [math], we have the immediate consequence that the probability that any positive percentage of the vertices lies in mesoscopic components decays exponentially towards zero. This implies the convergence in probability of $\overline{\rm Me}_{R_{N},\varepsilon_{N}}^{{{\scriptscriptstyle{({N}})}}}$ towards zero with exponential decay of the probability of a decay by any positive amount. Interestingly, taking $R_{N}+1=\varepsilon_{N}N\in\mathbb{N}$ , we see that already just one mesoscopic size alone satisfies the same LDP as the entire $(R,\varepsilon)$ -mesoscopic total mass in the limit $R\to\infty$ , $\varepsilon\downarrow 0$ . The condition $|\frac{1}{\varepsilon_{N}}\log\varepsilon_{N}|\leq o(N)$ is not only a technical one, but implies that $\log N\ll\varepsilon_{N}N$ , taking care of the well-known fact that there are many clusters of size $O(\log N)$ in the sparse Erdős-Rényi random graph that stem from an extreme-value statistics effect of the microscopic clusters.

1.3. Our results: the phase transition in the light of the LDP

We now proceed with the study of the main phenomenon in the sparse Erdős-Rényi random graph: the phase transition of the emergence of a giant component. We will deduce it from our large-deviations rate functions from Section 1.2. The LDPs and the identification of their strict minimiser(s) lead to laws of large numbers for a number of random quantities. Indeed, it is a standard and simple fact from large-deviations theory that a random variable that satisfies an LDP with a rate function that contains precisely one minimizer converges in probability to that minimizer. We will exploit this fact to deduce laws of large numbers. As before, the parameter $t\in(0,\infty)$ will play the decisive role; recall that the connection probability $\frac{1}{N}t_{N}$ of the graph ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ was picked as $t_{N}=t+o(1)$ as $N\to\infty$ .

Consider the following functions of the total masses of the microscopic and macroscopic particles respectively:

[TABLE]

where $c\in[0,1]$ . These two functions are not entirely analogous to ${\mathcal{J}}_{\rm Me}(c;t)$ as rate functions for the total masses of the micro and the macro part, because the total masses both of ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}$ and ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ are equal to one. This is consistent with the fact that the contraction principle cannot be applied to total masses, as they are not continuous functions of the measures. However, they contain rather interesting information about the phase transition.

Theorem 1.5 (Microscopic total mass phase transition).

(1)

For any $c\in[0,1]$ ,

[TABLE]

Moreover, ${\mathcal{J}}_{\rm Mi}(c;t)={\mathcal{J}}_{\rm Ma}(1-c;t)$ . 2. (2)

For $c\in(0,1]$ , the minimum of ${\mathcal{N}}(c)\ni\Lambda\mapsto{\mathcal{I}}_{\rm Mi}(\Lambda;t)$ is attained precisely at $\Lambda^{*}(c;t)\in{\mathcal{N}}(c)$ given by

[TABLE]

and the minimum of the function $c\mapsto{\mathcal{J}}_{\rm Mi}(c;t)$ is attained precisely at $c=1$ with value ${\mathcal{J}}_{\rm Mi}(1;t)=0$ . Therefore the infimum

[TABLE]

is attained at $(\Lambda,\alpha)=(\Lambda^{*}(1;t),\mathbf{0})$ , where $\mathbf{0}=(0,0,\dots)$ . 3. (3)

For $t\in(1,\infty)$ , the minimum of the function $c\mapsto{\mathcal{J}}_{\rm Mi}(c;t)$ is attained at $c=\beta_{t}$ where $\beta_{t}\in(0,t)$ is the smallest positive solution to

[TABLE]

The infimum in (1.14) is attained precisely at $(\Lambda,\alpha)=(\Lambda^{*}(\beta_{t};t),(1-\beta_{t},0,0,\dots))$ .

The proof is found in Section 4.2.

The two different cases in (1.12) refer to the cases that the first minimum in (1.11) is attained or not. Indeed, for $c\leq\frac{1}{t}$ , the function ${\mathcal{I}}_{\rm Mi}(\cdot;t)$ is minimized in an optimal $\Lambda^{*}$ with $c_{\Lambda^{*}}=c$ . However, for $c>\frac{1}{t}$ this is not possible, but only minimizing sequences can be found that achieve a total mass of $\frac{1}{t}$ in the microscopic measure and displace the remaining mass $c-\frac{1}{t}$ to the mesoscopic part. This shows that the phase transition originates from the impossibility of picking an optimal microscopic configuration $\Lambda^{*}$ if its total mass $c_{\Lambda^{*}}$ is required to be too large; the threshold being $1/t$ . If this is exceeded, then a minimization can be done only with the help of some non-trivial mesoscopic part. As we mentioned above, the macroscopic configuration is always minimized in one single giant component.

The same effect is seen in ${\mathcal{I}}_{\rm Ma}(c;t)$ , where first an optimization over $\Lambda$ with $c_{\Lambda}\leq 1-c$ is performed, and such a balance between microscopic and mesoscopic mass can pop out if $1-c$ is large enough. Subsequently, optimizing over ${\mathcal{M}}_{\mathbb{N}_{0}}((0,1];c)$ is straightforward. The equality ${\mathcal{J}}_{\rm Mi}(c;t)={\mathcal{J}}_{\rm Ma}(1-c;t)$ follows from this.

In Theorem 1.5(2) and (3) we see the different behaviour for subcritical, respectively supercritical $t$ in terms of the microscopic configuration. Note that this configuration is actually given by

[TABLE]

where ${\rm Bo}_{\mu}$ is the Borel distribution with parameter $\mu\in[0,1]$ . We see that such an optimal $\Lambda^{*}(c;t)$ cannot be found if $c>\frac{1}{t}$ , and this is an admissible total mass only when $t>1$ , marking the threshold between subcritical and supercritical regime (otherwise, there is no relevant case distinction as to the value of $c$ ). In this way, the Borel distribution appears as the natural minimizer of the microscopic part of the rate function.

In earlier work (see [Pit90]), the appearance of the Borel distribution in this context came from the observation that ${\rm Bo}_{\mu}$ is the distribution of the total progeny of a Galton–Watson tree with offspring that is Poisson-distributed with parameter $\mu$ . The characterisation of the emerging cluster-size distribution $\Lambda^{*}$ was based on an approximation of the connected subgraphs by such trees and counting the total number of trees of a given size in the graph. This approximation argument was extended to a large-deviation setting in [BC15], see Section 1.4.

Theorem 1.5 characterises the well-known phase transition of the ermergence of a giant component at $t=1$ in terms of a natural notion that is familiar to statistical physics: as a non-analyticity of the limiting free energy for the total mass of the microscopic configuration, which is equal to the infimum of ${\mathcal{J}}_{\rm Mi}(\cdot;t)$ . Indeed, this function is zero in $[0,1]$ , but positive in $(1,\infty)$ .

Another characterisation of this phase transition is in terms of a law of large numbers. Indeed, combining Theorem 1.5 with the LDP in Theorem 1.1 one has

[TABLE]

In words, this means that, for any $k\in\mathbb{N}$ , $\frac{1}{N}$ times the number of components of size $k$ converges to $\lambda_{k}^{*}(1;t)$ in the sub-critical regime and to $\lambda_{k}^{*}(\beta_{t},t)$ in the supercritical regime, while there is no macroscopic component in the first regime and there is precisely one macroscopic cluster of cardinality $\sim N(1-\beta_{t})$ in the second. All these statements are in the sense of convergence in probability, and the probability of a deviation by any positive amount decays even exponentially in $N$ .

One also sees that the cut-off versions of the total masses, $\sum_{k=1}^{R}k{\rm Mi}_{k}^{{{\scriptscriptstyle{({N}})}}}$ and $\int_{[\varepsilon,1]}x\,{\rm Ma}^{{{\scriptscriptstyle{({N}})}}}({\rm d}x)$ , converge towards the respective cut-off versions of the limits, and their limits as $R\to\infty$ and $\varepsilon\downarrow 0$ are $(1,0)$ for $t\leq 1$ and $(\beta_{t},1-\beta_{t})$ for $t\geq 1$ .

1.4. Related works on LDPs for Erdős-Rényi graphs

Despite the extensive literature on the Erdős-Rényi graph, there are not many results about large deviations in the sparse regime. Here we summarize, to the best of our knowledge, the existing results and how they relate to our work.

Two LDPs for the size of the largest component and for the number of isolated vertices have been derived in [O’C98]. These are two quantities that are obviously functionals of our measures ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ , respectively of ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}$ . Indeed, the largest component is the mass of the largest atom of ${\rm Ma}^{{{\scriptscriptstyle{({N}})}}}$ , and the number of isolated vertices is equal to $N$ times ${\rm Mi}_{1}^{{{\scriptscriptstyle{({N}})}}}$ . Both these two functionals are continuous, such that the contraction principle applies. The approach of [O’C98] is a simplified version of our comprehensive approach for the joint distribution of all the component sizes, and consequently it leads to formulas for the rate functions that are contractions of our rate function, which is straightforward to see. This explains also the remark made about the lack of convexity of the rate function in [O’C98]. Indeed, contraction often ruins convexity and contracted rate functions are rarely convex. Hence, our work includes the results of [O’C98].

A route that is inspired by statistical physics is taken in [EMH04], where the distribution of the random graph is tilted with a parameter $q>1$ raised to the power of the number of components, properly normalized. The analysis of the free energy of the corresponding partition sum is carried out there. Via the well-known Laplace dualism, the results is essentially equivalent to an LDP for the number of components. This functional is equal to the continuous functional ${\rm Mi}^{{{\scriptscriptstyle{({N}})}}}(\mathbb{N})$ in our setting. The pecularity of [EMH04] is that this model is put into relation with the $q$ -state Potts model in the limit $q\downarrow 1$ via diagrammatic expansion techniques. In particular, they derive limiting formulas for the size of the giant component, the degree distributions inside and outside the giant component, and the distribution of small component sizes.

We already mentioned that registering each component as a subgraph (rather than only as its size) would give a priori a much more detailed description, at least for any fixed $N$ . However, in the limit as $N\to\infty$ , in the LDP regime that we consider, only few subgraph configurations survive: the microscopic components survive only as spanning trees, and only those macroscopic components survive that have an excess of edges of order $\Uptheta(N)$ . The first has been carried out in [BC15], the second in [Puh05].

Indeed, the macroscopic part of our LDP is covered in [Puh05]. The author gives an LDP for the joint distribution of the total number of components, the sequence of the sizes of macroscopic ones, and the sequence of corresponding numbers of the excess edges appended with zeros. Therefore the contraction of this LDP to the total number of components and the macroscopic sizes (see [Puh05, Corollary 2.1]) is equal to the contraction of our LDP from Theorem 1.1 to the LDP for $\left(\sum_{k}{\rm Mi}_{k}^{{{\scriptscriptstyle{({N}})}}},{\rm Ma}^{{{\scriptscriptstyle{({N}})}}}\right)$ . The same is of course true, when considering exclusively macroscopic sizes (compare [Puh05, Corollary 2.2] with Corollary 1.3). The approach in [Puh05] goes along a very different route, involving recursive formulas for the graphs ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ if $N$ increases, and consequently the form of the rate function derived there is pretty different from ours; it involves an additional minimization procedure. It would require some work to analytically check that it is identical to ours.

In [BC15], an LDP for the empirical measure of all the components rooted at the vertices, is derived with an explicit rate function. The topology used there comes from a distance that looks only at intersections of graphs with bounded sets, so it can detect only microscopic components. In this way it is contained in our results. However, [BC15] considers the components as graphs, not only as sizes, and gets therefore a much more detailed picture. Moreover, the object described in [BC15] is a size-biased version of our microscopic measure, since we are counting components of a certain size, while they consider the component containing each vertex and therefore counting a certain component proportionally to the number of vertices it contains. Hence, the LDP of [BC15] contains the microscopic part of our LDP (Corollary 1.2) via the contraction principle, but a certain normalization has to be performed to actually compare the two objects. However, let us notice that [BC15, Theorem 1.8] shows that the rate function takes the form of a sort of relative entropy with respect to a Galton Watson tree with Poisson offspring distribution (plus additional constants). This form is shown also in our contracted rate function from Corollary 1.2, which in (4.2) we rewrite in terms of a relative entropy with respect to a distribution related to the Borel distribution (which is the distribution of the total progeny of precisely a Galton Watson tree with Poisson offspring). Also in this case, one sees that the Borel distribution appears as a size-biased version of our reference distribution.

Under assumptions that imply that the connection probability of the Erdős-Rényi graph ${\mathcal{G}}(N,p)$ satisfies $p\gg N^{-\frac{1}{2}}$ , recent progress has been made on the upper tails of sub-graph counts [CD16, Aug18, CD18].

In the case of dense graphs, that is, for ${\mathcal{G}}(N,p)$ with fixed $p\in(0,1)$ , there is a complete treatment thanks to Chatterjee and Varadhan [CV11], see [Cha16] for an overview. This regime is rather different from the sparse regime, since a proper formulation of the relevant limiting objects requires a abstract setting evolving around the notion of a graphon.

1.5. Application to coagulation models

Our interest in this research came from the desire to understand dynamical particle systems with coagulation in the large-system limit. It turned out that one of the most prominent (and most simple) models, the Marcus–Lushnikov model of coagulation, see [Mar68, Gil72, Lus78], admits a one-to-one correspondence to the component sizes of the Erdős-Rényi random graph that we study in this paper. This coagulation process is a continuous-time Markov process of vectors of particle masses $(S_{i}^{{{\scriptscriptstyle{({N}})}}}(t))_{i\in\{1,\dots,n_{t}\}}\in(\mathbb{N}_{0})^{n_{t}}$ at time $t\in[0,\infty)$ , arranged in descending order, precisely as in (1.1), where $n_{t}$ is the number of particles at time $t$ . This process is specified by the initial configuration, which we take in the monodisperse case, i.e., $S_{i}^{{{\scriptscriptstyle{({N}})}}}(0)=1$ for all $i=1,\dots,N=n_{0}$ , and by the transition mechanism, which is given in terms of a symmetric, non-negative coagulation kernel $K_{N}\colon\mathbb{N}\times\mathbb{N}\to[0,\infty)$ . That is, we start with $N$ particles of unit mass at time 0, and in the course of the process, each (unordered) pair of particles with respective masses $m,\widetilde{m}\in\mathbb{N}$ coagulate to a particle of mass $m+\widetilde{m}$ with rate $K_{N}(m,\widetilde{m})$ , independently of all the other pairs of particles.

The important special case of the multiplicative kernel, $K_{N}(m,\widetilde{m})=m\widetilde{m}/N$ has the two interesting features: (1) it can be mapped onto the Erdős-Rényi random graph that we study in this paper, and (2) it exhibits an interesting gelation phase transition in the limit $N\to\infty$ at time $t=1$ , because a gel, i.e., a particle of macroscopic size, appears. Indeed, for this process, it turned out in the review [Ald99] that, for fixed $t\in[0,\infty)$ , the distribution of the family $(S_{i}^{{{\scriptscriptstyle{({N}})}}})_{i\in\{1,\dots,n\}}$ of the component sizes of ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ defined in (1.1) with $\frac{1}{N}t_{N}=1-{\operatorname{e}}^{-t/N}$ is identical to the family $(S_{i}^{{{\scriptscriptstyle{({N}})}}}(t))_{i\in\{1,\dots,n_{t}\}}$ of particle masses in the multiplicative coalescent at time $t$ . This correspondence was not mentioned in [BP90], but was discussed one year later in [BP91], which highlights the connection between gelation in the coagulation process and the phase transition given by the formation of a giant connected component in the Erdős-Rényi random graph [ER60].

Hence, the results of this paper also recover the gelation phenomenon in rather explicit terms through a LDP in terms of the microscopic, mesoscopic and macroscopic parts, in the same way as we explained in the above sections for the Erdős-Rényi random graph.

Smoluchowski introduced a (deterministic) ODE model for the concentrations of coagulating particles in the course of his work on Brownian motion [vS16]. Indeed, it is reasonable to assume that $l_{k}(t)=\lim_{N\rightarrow\infty}\frac{1}{N}\#\{\text{particles of size }k\text{ at time }t\}$ exists under suitable conditions, see [Nor99, LM04, MN14]. These limits satisfy

[TABLE]

where $K(m,\widetilde{m})=\lim_{N\rightarrow\infty}NK_{N}(m,\widetilde{m})$ is the limiting coagulation kernel (in our case, $K(m,\widetilde{m})=m\widetilde{m}$ ). This is the famous Smoluchowski equation. Intuitively, the positive terms on the right-hand side of (1.16) take into account that the fraction of particles of mass $k$ increases if a particle of mass $m$ and one of mass $\widetilde{m}$ (with $m+\widetilde{m}=k$ ) merge and this happens with rate $K(m,\widetilde{m})$ . On the other hand, the negative term describes that a particle of mass $k$ can coagulate with particles of any size $m$ with rate $K(k,m)$ and this is why it involves an infinite sum (over all $m\in\mathbb{N}$ ). One can check for $t\leq 1$ that $\Lambda^{*}(1;t)$ appearing in Theorem 1.5 is the exact solution of (1.16), the Smoluchowski equation, also given in [Ald99, Table 2]. As a consequence, the above mentioned gelation phase transition as well as the solution of the Smoluchowski ODE are also clear from our results in Sections 1.2 and 1.3 and receive therefore a new interpretation in terms of combinatorial structures.

In the light of the process character of the Marcus–Lushnikov coagulation model, it will be desirable to derive a pathwise version of the LDP of Theorem 1.1. This will require a version of that theorem which starts from an arbitrary configuration rather than from $S_{i}^{{{\scriptscriptstyle{({N}})}}}(0)=1$ . This may also be interesting for the time-dependent version of the Erdős-Rényi graph $({\mathcal{G}}(N,1-{\operatorname{e}}^{t/N}))_{t\in[0,\infty)}$ , but not as natural as for the Marcus–Lushnikov model. Another aspect that makes it particularly interesting for coagulation models is the availability of alternative methods in the spirit of Wentzell–Freidlin theory to derive pathwise LDPs for coagulation models, see [MPPR17]. Let us also mention that, in the renowned paper [Ald97], time is expanded around the critical value $t=1$ , and the mesoscopic components of the graph are compared to a stochastic process known as the multiplicative coalescent. Although we allow for fluctuations around $t$ (our LDP holds for any sequence $t_{N}\sim t$ ), we cannot capture this regime around $t=1$ , since we expect a LDP for mesoscopic particles to hold on a different scale. Another natural direction of our future research is an extension to the case of an inhomogeneous Erdős-Rényi graph as introduced in [BJR07]. We will defer future work to these questions.

1.6. Comparison to Bose-Einstein condensation without interaction

Our large-deviations approach to the Marcus–Lushnikov models shows remarkable similarities to another well-known phase transition in a non-spatial model, the non-interacting Bose gas. Here the situation is similar in that the gas can be conceived as a joint distribution of $N$ particles that are randomly grouped into smaller units, called cycles, which can become arbitrarily large. The natural question is then, under what circumstances do macroscopic cycles arise. An explicit answer in terms of a large-deviations analysis has been given in [Ada08], where the transition, the famous Bose–Einstein Condensation (BEC) in dimensions $d\geq 3$ , is derived from the minimization of the rate function, in a way analogous to that in our Theorem 1.5. The two phase transitions differ in that the BEC transition is of saturation type, while the gelation transition is not.

For the non-interacting Bose gas in the thermodynamic limit at temperature $1/\beta\in(0,\infty)$ with particle density $\rho\in(0,\infty)$ the partition function is given by

[TABLE]

where $\Lambda_{N}$ is the centred box in $\mathbb{R}^{d}$ with volume $N/\rho$ . The free energy per particle is then

[TABLE]

For the Erdős-Rény graph ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ , the equivalent quantity is the rate function ${\mathcal{I}}_{\rm Mi}$ from (1.7). The key difference between the rate functions is that only ${\mathcal{I}}_{\rm Mi}$ contains terms in the total mass of microscopic components, $c_{\Lambda}$ . This reflects the fact that the giant component makes a significant contribution to the rate function in the graph model, but the condensate in the non-interacting Bose gas does not.

The respective minimisers of $I_{\rm Mi}$ and $I$ are

[TABLE]

where $c$ and $\alpha$ control the values of $\sum_{k}k\lambda_{k}$ .

The crucial parameters are $t$ for the graph model and the inverse temperature $\beta$ for the Bose gas. Both models have a trivial upper bound for the total microscopic mass, $\sum_{k}k\lambda_{k}$ , namely one. One additional upper bound arises in each model from the optimisation of the rate function with respect to the $\lambda_{k}$ , but these are not relevant, until $t$ respectively $\beta$ rises to its critical value. For the graph model this bound is $1/t$ , because $\sum_{k}\frac{(ct{\operatorname{e}}^{-ct})^{k}}{k^{1-k}\,k!}\leq 1$ for all $ct\in(0,\infty)$ , and the summands take their maxima at $ct=1$ , when they correspond to the Borel probability distribution with parameter 1. For $\Lambda^{{{\scriptscriptstyle{({\rm BEC}})}}}$ this bound is $\rho^{-1}(4\pi\beta)^{-d/2}\sum_{k}k^{-\frac{d}{2}}$ . At this point we see a difference between the two models, because the total microscopic mass in the Bose gas remains on this bound as $\beta$ rises further, while for the graph model it immediately drops strictly below the bound. This explains why BEC is known as a saturation phase transition, but this description cannot be applied to gelation.

2. Preparations for the proof of the LDP

We consider the Erdős-Rényi graph ${\mathcal{G}}={\mathcal{G}}(N,p)$ under the corresponding probability measure $\mathbb{P}_{N,p}$ . In Section 2.1, we derive an explicit formula for the distribution of the empirical measure of the component sizes $S_{i}^{{{\scriptscriptstyle{({N}})}}}$ in terms of connectivity probabilities for (smaller) Erdős-Rényi random graphs. Furthermore, we prepare in Section 2.2 for the asymptotic analysis f or $p=\frac{1}{N}t_{N}$ with $t_{N}=t+o(1)$ by recalling from [Ste70] some estimates and asymptotics for this connectivity probability.

2.1. The joint distribution of the component sizes

An important quantity is

[TABLE]

We will be concerned with this quantity for fixed $k$ , but with connection probability $p=\frac{1}{N}t_{N}$ , in the limit $N\to\infty$ .

We define the state space of the collection of component sizes as

[TABLE]

To each element $(s_{i})_{i}$ of the state space $E_{N}$ , we associate a unique element of the space

[TABLE]

where for each $k$ , $\ell_{k}$ is the number of indices $i$ such that $s_{i}=k$ . The map $(s_{i})_{i}\mapsto\ell$ is a bijection and in the following we refer to configurations equally in terms of $(s_{i})_{i}$ or $\ell$ .

We denote by ${\mathcal{P}}_{N}$ the set of all partitions of $[N]=\{1,\dots,N\}$ . We write $B_{k}(\pi)$ for the number of sets in $\pi\in{\mathcal{P}}_{N}$ with cardinality $k$ . Then we can describe the joint distribution of all the component sizes of ${\mathcal{G}}(N,p)$ as follows.

Lemma 2.1.

For any $p\in[0,1]$ , $N\in\mathbb{N}$ and every $(s_{i})_{i}\in E_{N}$ ,

[TABLE]

Proof.

A set $A\subset\{1,\dots,N\}$ of vertices is a connected component in the graph ${\mathcal{G}}(N,p)$ if and only if (1) no bond between any vertex in $A$ and any vertex outside has been put, and (2) the subgraph formed out of the vertices in $A$ and all the bonds between any two vertices in $A$ is connected. This has probability $(1-p)^{|A|\,|A^{\rm c}|}\times\mu_{|A|}(p)$ . Applying this reasoning to $A^{\rm c}$ and describing the next component, and iterating this argument, shows that the product of the two products on the right-hand side of (2.4) is equal to the probability, for a given partition $\pi$ with $\ell_{k}$ sets of size $k$ for any $k$ , that the components of ${\mathcal{G}}(N,p)$ are precisely the sets of $\pi$ . Since this probability depends only on the cardinalities, the counting term completes the formula. ∎

Now we rewrite the right-hand side of (2.4) in terms of the empirical measure of $(s_{i})_{i}$ , i.e., of the numbers $\ell_{k}$ of indices $i$ such that $s_{i}=k$ . Introduce the event

[TABLE]

Corollary 2.2.

For any $p\in[0,1]$ , $N$ and any $\ell=(\ell_{k})_{k}\in\mathbb{N}_{0}^{\mathbb{N}}$ satisfying $\sum_{k}k\ell_{k}=N$ ,

[TABLE]

Proof.

Note that the last product on the right-hand side of (2.4) can also be written as $\prod_{i}(1-p)^{\frac{N}{2}m_{i}(N-m_{i})}$ . Hence, if $\ell_{k}$ is equal to the number of $i$ such that $s_{i}=k$ for any $k$ , then the product of the last two products can be written as

[TABLE]

The counting term is easily identified as

[TABLE]

Substituting ends the proof. ∎

(To avoid confusion, we note that there is a typographical error in Section 4.5 of [Ald99], where the factor of $\frac{1}{2}$ is missing in the exponent of (2.6).)

2.2. The probability of being connected

Our analysis of (2.6) will depend crucially on an analysis of $\mu_{k}(\frac{1}{N}t_{N})$ . The next two lemmas collect results from [Ste70, Lemma 1&2, Theorem 1].

Lemma 2.3 (Bounds and asymptotics for $\mu_{k}(\frac{1}{N}t_{N})$ , [Ste70]).

For any $p\in[0,1]$ , $N\in\mathbb{N}$ and any $k\leq N$ ,

[TABLE]

In particular, if $p=\frac{1}{N}t_{N}$ with $t_{N}=t+o(1)$ and $k=o(\sqrt{N})$ ,

[TABLE]

The expression for the upper bound in (2.7) appears to be present (using somewhat applied chemical language) in [Flo41, equation (5)].

The following is an alternative upper bound for $\mu_{k}(\frac{1}{N}t_{N})$ , which will be required for macroscopic components, together with an asymptotic result for the connection probability in the so-called sparse case, where the bond probability is proportional to the inverse of the size of the graph.

Lemma 2.4 ([Ste70]).

For all $p\in[0,1]$ and $k\in\mathbb{N}$

[TABLE]

with $q=\log(1-p)$ . Moreover, for $\alpha\in(0,1)$ and $t\in(0,\infty)$ and a sequence $t_{N}=t+o(1)$ , as $N\rightarrow\infty$ ,

[TABLE]

3. Proof of the LDP

In this section we prove the main result of this paper, the large-deviations principle in Theorem 1.1. Again, we fix the parameter $t\in(0,\infty)$ and a sequence $t_{N}=t+o(1)$ and consider the Erdős-Rényi graph ${\mathcal{G}}(N,\frac{1}{N}t_{N})$ with probability measure $\mathbb{P}_{N,\frac{1}{N}t_{N}}$ .

Recall the topological remarks on the two state spaces ${\mathcal{N}}$ and ${\mathcal{M}}$ from Section 1.1. The metrics $d$ on ${\mathcal{N}}$ and $D$ on ${\mathcal{M}}$ , defined by

[TABLE]

induce the respective topologies of pointwise and vague convergence. We write $B_{\delta}(\Lambda)$ , respectively $B_{\rho}(\alpha)$ , for the closed $\delta$ -ball around $\Lambda$ , respectively for the closed $\rho$ -ball around $\alpha$ . Since the rate function $I(\cdot;t)$ is lower semicontinuous in ${\mathcal{N}}\times{\mathcal{M}}$ and the space is compact, we know that it is a good rate function (i.e., its level sets are not only closed but also compact). Therefore, a weak LDP implies our main result, the LDP in Theorem 1.1, and it will be sufficient to prove the following.

Proposition 3.1.

For any $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ ,

[TABLE]

We split the proof of Proposition 3.1 in several lemmas and finish it at the end of Section 3. We start by bounding the cardinality of ${\mathcal{N}}_{N}$ .

Lemma 3.2.

Let ${\mathcal{N}}_{N}$ be as defined in (2.3), then

[TABLE]

Proof.

The following is an argument in [Ada08]. For any $\ell\in{\mathcal{N}}_{N}$ , the set $H(\ell)=\{k\in\mathbb{N}\colon\ell_{k}>0\}$ has no more than $2\sqrt{N}$ elements, since

[TABLE]

Hence,

[TABLE]

∎

Thanks to Lemma 3.2, it will be sufficient to get estimates on $\mathbb{P}_{N,\frac{1}{N}t_{N}}(A_{N}(\ell))$ , for any $\ell\in{\mathcal{N}}_{N}$ close enough to a fixed $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ . The strategy is to divide the terms in the product representation from Corollary 2.2 into three groups, which we refer to as micro-, meso- and macroscopic, because they take into account the contribution of, respectively, micro-, meso- and macro- components. We fix two increasing sequences $R_{N}$ and $\varepsilon_{N}N$ in $\mathbb{N}$ such that $R_{N}\nearrow\infty$ , $\varepsilon_{N}\downarrow 0$ and $R_{N}<\varepsilon_{N}N$ . We write

[TABLE]

where

[TABLE]

and

[TABLE]

Let us set

[TABLE]

Note that the sum of these three terms is equal to one. For the factor $N!$ , we use Stirling’s formula $N!=(\frac{N}{{\operatorname{e}}})^{N}{\operatorname{e}}^{o(N)}$ so that uniformly in $\ell\in{\mathcal{N}}_{N}$

[TABLE]

We will consider a cut-off version of the distances $d$ and $D$ introduced in (3.1), as follows:

[TABLE]

such that for a given $\ell\in{\mathcal{N}}_{N}$ we can measure simultaneously the distance of its microscopic part from $\Lambda$ and its macroscopic part from $\alpha$ . We also introduce some new notation, for any $\ell\in{\mathcal{N}}_{N}$ we will use the notation $\frac{1}{N}\ell$ to denote the sequence $\left(\frac{\ell_{k}}{N}\right)_{k\in\mathbb{N}}$ , which clearly denotes an element of ${\mathcal{N}}$ . On the other hand, with $\ell_{\lfloor\cdot N\rfloor}$ we denote the point measure on $(0,1]$ with weight $\ell_{k}$ at the point $\frac{k}{N}$ for any $k=1,\dots,N$ and zero everywhere else. This integer valued measure clearly belongs to ${\mathcal{M}}$ .

We start by looking at the term $F_{\rm Mi}(\ell)$ , i.e., the microscopic term and we combine it with the first term in (3.5).

Lemma 3.3.

Fix $\Lambda\in{\mathcal{N}}$ . Fix $\delta>0$ and pick sequences $\ell\in{\mathcal{N}}_{N}$ and $R_{N}\to\infty$ such that $d_{R_{N}}(\frac{1}{N}\ell,\Lambda)\leq\delta$ for all $N$ . Then, for any $R\in\mathbb{N}$ , as $N\to\infty$ ,

[TABLE]

where $\lim\limits_{R\to\infty}\gamma_{R}=0$ and $\lim\limits_{\delta\downarrow 0}C_{R}(\delta)=0$ .

Proof.

For any fixed $k\leq R_{N}$ , we use the upper bound in (2.7), the fact that $1-x\leq{\operatorname{e}}^{-x}$ and Stirling’s lower bound for $\ell_{k}!$ (notice that for $k$ small we expect $\ell_{k}$ to be large, $\Uptheta(N)$ ) to obtain

[TABLE]

We obtain, uniformly for $\ell\in{\mathcal{N}}_{N}$ , using that $\sum_{k=1}^{R_{N}}\frac{t}{2N}k^{2}\ell_{k}\leq\frac{t}{2}R_{N}c_{\mathrm{Mi}}(\ell/N)$ ,

[TABLE]

where

[TABLE]

is the cut-off version of the rate function defined in (1.5). Recall that $d_{R_{N}}(\frac{1}{N}\ell,\Lambda)<\delta$ and that $c_{\Lambda}=\sum_{k\in\mathbb{N}}k\lambda_{k}\in[0,1]$ and observe that $\lim_{R\to\infty}I^{{{\scriptscriptstyle{({R}})}}}_{\rm Mi}(\Lambda;t)=I_{\rm Mi}(\Lambda;t)$ .

To prove (3.6), we notice that $f^{{{\scriptscriptstyle{({R}})}}}(\cdot;t)$ is continuous, it is clear that $\sup\limits_{\ell\colon{\rm d}_{R}(\frac{1}{N}\ell,\Lambda)<\delta}|f^{{{\scriptscriptstyle{({R}})}}}(\frac{1}{N}\ell;t)-f^{{{\scriptscriptstyle{({R}})}}}(\Lambda;t)|$ vanishes as $\delta\downarrow 0$ and can therefore be estimated against such a $C_{R}(\delta)$ . Moreover, we estimate (substituting $\frac{1}{N}\ell$ by $\widetilde{\Lambda}$ ), for any $N$ such that $R_{N}>R$ , with the help of the Stirling bound $k!{\operatorname{e}}^{k}k^{-k}\geq 1$ and Jensen’s inequality for $\varphi(x)=x\log x$ , as follows:

[TABLE]

for some $c>0$ , where we used that the remainder sum $\sum_{k>R}\frac{1}{k^{2}}$ is of order $1/R$ as $R\to\infty$ and that $\sum^{R_{N}}_{k=R+1}\widetilde{\lambda}_{k}\leq 1/R$ since $\sum_{k}k\widetilde{\lambda}_{k}\leq 1$ and that the map $x\mapsto x\log(cRx)$ is decreasing in $(0,1/{\operatorname{e}}Rc)$ , introducing some $-\gamma_{R}$ that vanishes as $R\to\infty$ . This proves the claim (3.6). ∎

Notice that the last term on the right-hand side of (3.6) cannot be further estimated with the help of continuity (since $\Lambda\mapsto c_{\Lambda}$ is not continuous), but will be jointly handled together with the correspondent macroscopic and mesoscopic terms. Next we focus on the term $F_{\rm Ma}(\ell)$ , the macroscopic term and we proceed analogously.

Lemma 3.4.

Fix $\alpha\in{\mathcal{M}}$ . Fix $\rho>0$ and pick sequences $\ell\in{\mathcal{N}}_{N}$ and $\varepsilon_{N}\downarrow 0$ such that $D_{\varepsilon_{N}}(\ell_{\lfloor\cdot N\rfloor},\alpha)\leq\rho$ for all $N$ . Further assume that $|\frac{1}{\varepsilon_{N}}\log\varepsilon_{N}|\leq o(N)$ . Then, for any $\epsilon>0$ , as $N\to\infty$ ,

[TABLE]

for some $C_{\varepsilon}(\rho)$ and $\gamma_{\varepsilon}$ that satisfy $\lim_{\varepsilon\downarrow 0}\gamma_{\varepsilon}=0$ and $\lim_{\rho\downarrow 0}C_{\varepsilon}(\rho)=0$ .

Proof.

We use the upper bound in Lemma 2.4 and Stirling’s lower bound for $k!$ (in this case we know that $k$ is large and we expect $\ell_{k}$ small). We obtain, for $k\in\{\varepsilon_{N}N,\dots,N\}$ ,

[TABLE]

where $q_{N}=\log\left(1-\frac{t_{N}}{N}\right)$ . We pair $F_{\rm Ma}(\ell)$ with the second term in (3.5), and we obtain, uniformly for $\ell\in{\mathcal{N}}_{N}$ ,

[TABLE]

where

[TABLE]

with

[TABLE]

denotes the cut-off version of the rate function $I_{\rm Ma}$ defined in (1.5). Indeed, for proving the last line of (3.10) we do the following. In the product, we add the factor $(1-{\operatorname{e}}^{-kt/N})^{kl_{k}}$ and its reciprocal, substitute $\exp\circ\log$ and turn the sum on $k$ into an integral over $x$ . Then most of the terms are easily asymptotically identified with the corresponding terms in (3.11), with possible exception of the term

[TABLE]

of which we now show that it is not larger than ${\operatorname{e}}^{o(N)}$ . Now write $\ell$ in terms of $s=(s_{i})_{i\in\{1,\dots,n\}}\in E_{N}$ defined in (2.2), such that $\sum_{i}s_{i}=N$ , and we pick $i^{*}$ minimal such that $s_{i*+1}<\varepsilon_{N}N$ . Then, also using the inequalities $\log(1+y)\leq y$ and $1-{\operatorname{e}}^{-x}\leq x$ , we see that

[TABLE]

Recall the definition of $q_{N}$ to see that $q_{N}+t/N\leq o(1/N)$ . Use Jensen’s inequality and $\sum_{i}s_{i}\leq N$ to see that the entire last term is not larger than $o(N)$ . To handle the last missing term in (3.12), notice (because of $\sum_{k}k\ell_{k}=N$ ) that $\sum_{k\geq\varepsilon_{N}N}\ell_{k}\leq 1/\varepsilon_{N}$ and $1-{\operatorname{e}}^{kq_{N}}\geq 1-{\operatorname{e}}^{\varepsilon_{N}Nq_{N}}\geq\varepsilon_{N}Nq_{N}\sim\varepsilon_{N}t$ and therefore

[TABLE]

where we recall that we assumed that $|\frac{1}{\varepsilon_{N}}\log\varepsilon_{N}|\leq o(N)$ . Hence, the term in (3.12) is not larger than ${\operatorname{e}}^{o(N)}$ .

To prove (3.9), we first observe that $g^{{{\scriptscriptstyle{({\varepsilon}})}}}(\cdot;t)$ is continuous and hence $|g^{{{\scriptscriptstyle{({\varepsilon}})}}}(\ell_{\lfloor\cdot N\rfloor};t)-g^{{{\scriptscriptstyle{({\varepsilon}})}}}(\alpha;t)|$ can be estimated against such a $C_{\varepsilon}(\rho)$ , uniformly in $N\in\mathbb{N}$ and $\ell$ such that $D_{\varepsilon_{N}}(\ell_{\lfloor\cdot N\rfloor},\alpha)\leq\rho$ . Furthermore, for any $\varepsilon>0$ and any $N\in\mathbb{N}$ such that $\varepsilon_{N}<\varepsilon$ ,

[TABLE]

since $\log\frac{x}{1-{\operatorname{e}}^{-x}}\geq 0$ for all $x>0$ . Hence, we arrived at the bound in (3.9). ∎

Notice that again we refrain from estimating the term ${\operatorname{e}}^{-N(\frac{t}{2}-\log t)(c_{\mathrm{Ma}}(\ell/N)-c_{\alpha})}$ , which needs to be coupled with the microscopic and the mesoscopic part. Then we are left to handle the middle term in (3.3).

Lemma 3.5.

Fix $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ such that $c_{\Lambda}+c_{\alpha}\leq 1$ . Fix $\delta,\rho>0$ and pick sequences $\ell\in{\mathcal{N}}_{N}$ and $R_{N}\to\infty$ and $\varepsilon_{N}\downarrow 0$ such that $d_{R_{N}}(\frac{1}{N}\ell,\Lambda)\leq\delta$ and $D_{\varepsilon_{N}}(\ell_{\lfloor\cdot N\rfloor},\alpha)\leq\rho$ for all $N$ . Further assume that $|\frac{1}{\varepsilon_{N}}\log\varepsilon_{N}|\leq o(N)$ . Then, as $N\to\infty$ ,

[TABLE]

Proof.

We use again the upper bound in (2.7) and Stirling’s formula, to see that

[TABLE]

We claim that the right-hand side is equal to $(t{\operatorname{e}}^{-t/2})^{Nc_{\mathrm{Me}}(\ell/N)}{\operatorname{e}}^{NL_{N}(\ell)}$ for some $L_{N}(\ell)$ that vanishes, uniformly in $\ell$ , as $N\to\infty$ . First note that the next-to-last term is such a term, since $\frac{t}{2N}\sum_{k=R_{N}+1}^{\lfloor\varepsilon_{N}N\rfloor}k^{2}\ell_{k}\leq\frac{t}{2}\varepsilon_{N}Nc_{\mathrm{Me}}(\ell/N)$ . Furthermore, $\sum_{k=R_{N}+1}^{\lfloor\varepsilon_{N}N\rfloor}\ell_{k}\leq N/R_{N}$ , which shows that the terms containing $t$ and ${\operatorname{e}}$ in the first product are also so small. With the same approach as in (3.8), we see the lower bound

[TABLE]

Therefore, uniformly in $\ell$ such that $D_{\varepsilon_{N}}(\ell_{\lceil\cdot N\rceil},\alpha)\leq\rho$ , we have arrived at the estimate (3.13). ∎

Now we collect the upper bounds above and substitute them in (3.3), to obtain the following lemma.

Lemma 3.6.

Fix $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ such that $c_{\Lambda}+c_{\alpha}\leq 1$ . Fix $\delta,\rho>0$ and pick sequences $\ell\in{\mathcal{N}}_{N}$ and $R_{N}\to\infty$ and $\varepsilon_{N}\downarrow 0$ such that $d_{R_{N}}(\frac{1}{N}\ell,\Lambda)\leq\delta$ and $D_{\varepsilon_{N}}(\ell_{\lfloor\cdot N\rfloor},\alpha)\leq\rho$ for all $N$ . Further assume that $|\frac{1}{\varepsilon_{N}}\log\varepsilon_{N}|\leq o(N)$ . Then, for any $R\in\mathbb{N}$ and $\epsilon>0$ ,

[TABLE]

where $K_{R,\varepsilon}(\delta,\rho)$ vanishes as $\delta\downarrow 0$ and $\rho\downarrow 0$ , followed by $R\to\infty$ and $\varepsilon\downarrow 0$ .

Proof.

We collect the upper bound (3.6) from Lemma 3.3, (3.9) from Lemma 3.4 and (3.13) from Lemma 3.5. We substitute them in (3.3), also using (3.5), then we obtain, uniformly in $\ell$ such that $d(\frac{1}{N}\ell,\Lambda)<\delta$ and $D(\ell_{\lfloor\cdot N\rfloor},\alpha)<\rho$ , for any $R\in\mathbb{N}$ and any $\varepsilon>0$ , as $N\to\infty$ ,

[TABLE]

where $K_{R,\varepsilon}(\delta,\rho)$ vanishes as $\delta\downarrow 0$ and $\rho\downarrow 0$ , followed by $R\to\infty$ and $\varepsilon\downarrow 0$ , and we recall that $c_{\rm Me}(\ell/N)=1-c_{\rm Mi}(\ell/N)-c_{\rm Ma}(\ell/N)$ . This implies the upper bound in (3.2) in the case where $c_{\Lambda}+c_{\alpha}\leq 1$ .

∎

In the following lemma, we implicitly use the lower semicontinuity of the maps $\Lambda\mapsto c_{\Lambda}$ and $\alpha\mapsto c_{\alpha}$ to show that when $c_{\Lambda}+c_{\alpha}>1$ , then the event $A_{N,t}(\ell)$ is empty for any $\ell$ such that $d_{R_{N}}(\frac{1}{N}\ell,\Lambda)\leq\delta$ and $D_{\varepsilon_{N}}(\ell_{\lfloor\cdot N\rfloor},\alpha)\leq\rho$ , if $\delta$ and $\rho$ are small enough. This will give the right super-exponential upper bound for $\mathbb{P}_{N,\frac{1}{N}t_{N}}(A_{N,t}(\ell))$ , since $I(\Lambda,\alpha;t)=\infty$ .

Lemma 3.7.

Let $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ such that $c_{\Lambda}+c_{\alpha}>1$ , then there exists $R\in\mathbb{N}$ , $\epsilon,\delta,\rho>0$ and $N_{0}$ large enough such that for all $N>N_{0}$ :

[TABLE]

Proof.

We pick $R\in\mathbb{N}$ so large and $\varepsilon\in(0,1)$ so small that $\sum_{k=1}^{R}k\lambda_{k}+\int_{[\varepsilon,1]}x\,\alpha({\rm d}x)$ are larger than one, say equal to $1+\eta$ for some $\eta>0$ . Then choose $\delta$ and $\rho$ in $(0,1)$ so small that, for any $\ell$ such that $d_{R}(\frac{1}{N}\ell,\Lambda)\leq\delta$ and $D_{\epsilon}(\ell_{\lfloor\cdot N\rfloor},\alpha)\leq\rho$ , we have $\frac{1}{N}\sum_{k=1}^{R}k\ell_{k}-\sum_{k=1}^{R}k\lambda_{k}\geq-\frac{\eta}{3}$ and $\frac{1}{N}\sum_{k=R+1}^{N}k\ell_{k}-\int_{[\varepsilon,1]}x\,\alpha({\rm d}x)\geq-\frac{\eta}{3}$ . Therefore we see that

[TABLE]

which yields a contradiction. ∎

The remaining of this section deals with the construction of an optimal sequence $(\ell^{{{\scriptscriptstyle{({N}})}}})_{N\in\mathbb{N}}$ , which will give a lower bound on the probability that matches the upper bound from Lemma 3.6. For $N$ large enough define $\ell^{{{\scriptscriptstyle{({N}})}}}\in{\mathcal{N}}_{N}$ by

[TABLE]

where $R_{N}$ is an arbitrary diverging sequence in $\mathbb{N}$ such that $R_{N}\ll N$ .

We notice that our sequence $(\ell^{(N)})_{N\in\mathbb{N}}$ is such that the so-called mesoscopic mass is simply concentrated in components all of the same size (namely $R_{N}$ ) and surprisingly no specific requirement is imposed on $R_{N}$ , except that it diverges. We will underline this in the steps of our proof. It is clear by construction that the following hold

[TABLE]

We now give lower bounds to $\mathbb{P}_{N,\frac{1}{N}t_{N}}(A_{N,t}(\ell^{{{\scriptscriptstyle{({N}})}}}))$ , starting from the formulation in (3.3) and (3.5). By abuse of notation, we will drop the index $(N)$ from $\ell$ .

Lemma 3.8.

Fix $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ such that $I(\Lambda,\alpha;t)<\infty$ and let $\ell$ be defined by (3.14). Then, as $N\to\infty$ ,

[TABLE]

where $I_{\rm Mi}^{{{\scriptscriptstyle{({R_{N}}})}}}(\Lambda;t)$ is defined in (3.7).

Proof.

We use the lower bound in (2.7) from Lemma 2.3 to perform a calculation similar to that for the upper bound (using now Stirling’s upper bound on $\ell_{k}!$ ):

[TABLE]

which proves the claim (3.18). ∎

Notice that no restriction is required on $R_{N}$ in order for the above lower bound to coincide with the upper bound in the proof of Lemma 3.6 (except that $R_{N}$ diverges). Indeed, although the lower bound in (2.7) differs from the upper bound for a factor $\left(1-\frac{t_{N}}{N}\right)^{\frac{(k-1)(k-2)}{2}}$ the probability $\mu_{t_{N}}(k)$ is paired with $\left(1-\frac{t_{N}}{N}\right)^{\frac{k(N-k)}{2}}$ (the probability of a component to be separated from any other), which is an exact term and it balances the error coming from (2.7). In a similar way, one checks the following lower bound for the term involving $\ell_{k}$ , for $k=R_{N}$ .

Lemma 3.9.

Fix $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ such that $I(\Lambda,\alpha;t)<\infty$ and let $\ell$ be defined by (3.14). Then, as $N\to\infty$ ,

[TABLE]

Proof.

[TABLE]

The above lower bound relies on the lower bound (2.7) in Lemma 2.3, on Stirling’s upper bound for $R_{N}!$ and $\ell_{R_{N}}!$ (which are both large) and on how we defined $\ell_{R_{N}}$ in (3.14). ∎

We notice that the term coming from Lemma 3.9, in order to give the desired lower bound matching the upper bound from Lemma 3.6, does not add any condition on the sequence $R_{N}$ , which is just supposed to diverge.

Lemma 3.10.

Fix $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ such that $I(\Lambda,\alpha;t)<\infty$ and let $\ell$ be defined by (3.14). Then, for all $\delta\in(0,1)$ , as $N\to\infty$ ,

[TABLE]

where $I^{(\delta)}_{\rm Ma}(\alpha;t)$ is defined in (3.11), and $r_{\delta}$ depends on $\int_{0}^{\delta}x\alpha({\rm d}x)$ and it goes to zero when $\delta\searrow 0$ .

Proof.

Now, fix $\delta\in(0,1)$ . We use the lower bound (2.7) from Lemma 2.3 and Stirling’s upper bound on $k!$ and we write:

[TABLE]

where the remainder is defined as

[TABLE]

By defining $\ell_{k}:=\alpha\big{(}\frac{k-1}{N},\frac{k}{N}\big{]}$ and using that $\log k\leq k$ , we see that

[TABLE]

Finally, we use that $\sum_{h=1}^{\ell_{k}}\log h\leq\int_{1}^{\ell_{k}}\log(y\,){\rm d}y$ and Jensen’s inequality (since $x\log x$ is concave) to give the bound

[TABLE]

Exploiting the bounds above, we can say that

[TABLE]

where $r_{\delta}$ depends on $\int_{0}^{\delta}x\alpha({\rm d}x)$ and it goes to zero when $\delta\searrow 0$ . Notice that we could not handle directly the terms from $R_{N}+1$ to $N$ in one go, so we decided to fix a fictitious threshold $\delta N$ , where $\delta$ can be chosen arbitrarily close to zero. This however does not affect the choice of $R_{N}$ and there is no need to add some condition on $R_{N}$ to handle such small, but macroscopic, components.

The remaining terms (the purely macroscopic ones) in (3.3) are treated as follows. We use Stirling’s upper bound for $k!$ and we see that

[TABLE]

Let us focus on the integral $\int_{(\delta,1]}x\Big{[}\log\Big{(}\frac{\mu_{\lfloor Nx\rfloor}(\frac{1}{N}t_{N})^{\frac{1}{\lfloor Nx\rfloor}}}{x}\Big{)}-\frac{t}{2}(1-x)\Big{]}\alpha({\rm d}x)$ . Now by (2.8) in Lemma 2.4, we know that the integrand converges pointwise to

[TABLE]

Since, for $N$ large enough,

[TABLE]

which is clearly integrable over $x\in(\delta,1]$ with respect to $\alpha$ , we can apply the dominated convergence theorem and we get

[TABLE]

where $I^{(\delta)}_{\rm Ma}(\alpha;t)$ is defined in (3.11). Now, (3.21) together with (3.22) imply (3.20). ∎

Finally, we combine below the lower bounds in Lemma 3.8, 3.9 and 3.10.

Lemma 3.11.

For all $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ such that $I(\Lambda,\alpha;t)<\infty$ , there exists a sequence $(\ell^{{{\scriptscriptstyle{({N}})}}})_{N\in\mathbb{N}}$ such that $\ell^{{{\scriptscriptstyle{({N}})}}}\in{\mathcal{N}}_{N}$ , (3.15) and (3.16) hold, and

[TABLE]

Proof.

From the lower bounds in (3.18), (3.19) and (3.20), we get that there exists $a_{\delta}$ with $\lim_{\delta\searrow 0}a_{\delta}=0$ and

[TABLE]

so (3.23) follows on taking the limit $\delta\searrow 0$ .

∎

Now we are ready to prove Proposition 3.1 by combining the above lemmas.

Proof of Proposition 3.1.

Fix $\delta,\rho>0$ and $N\in\mathbb{N}$ and recall the definition of $A_{N,t}(\ell)$ in (2.5), then we see that

[TABLE]

Because of Lemma 3.2, we only have to give asymptotic estimates on the single summands on the right-hand side of (3.25). Let $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ such that $c_{\Lambda}+c_{\alpha}\leq 1$ . We fix any diverging sequence $R_{N}$ and vanishing sequence $\varepsilon_{N}$ with $R_{N}<\lfloor\varepsilon_{N}N\rfloor$ and $|\frac{1}{\varepsilon_{N}}\log\varepsilon_{N}|\leq o(N)$ , the we see that

[TABLE]

as a consequence of Lemma 3.6. Taking first $\delta$ and $\rho$ to zero and then $R\nearrow\infty$ and $\epsilon\searrow 0$ , we get the desired upper bound.

Given any fixed $\delta$ and $\rho$ , we can construct the sequence $(\ell^{(N)})_{N\in\mathbb{N}}$ from Lemma 3.11 and see that, for such a sequence

[TABLE]

for an $a$ arbitrarily small.

Finally, if $(\Lambda,\alpha)\in{\mathcal{N}}\times{\mathcal{M}}$ are such that $c_{\Lambda}+c_{\alpha}>1$ , Lemma 3.6 gives us that

[TABLE]

∎

4. Corollaries and study of the rate functions

In this section we analyse, for fixed $t\in[0,\infty)$ , the minima of the rate function, $I(\Lambda,\alpha;t)$ , over the configurations $\Lambda$ respectively $\alpha$ , and afterwards the minimima of the rate functions for the total masses, ${\mathcal{J}}_{\rm Mi}$ , ${\mathcal{J}}_{\rm Me}$ and ${\mathcal{J}}_{\rm Ma}$ . In particular, we will see in Lemma 4.2 that there is a drastic difference when minimizing $I(\Lambda,\alpha;t)$ over $\Lambda$ if $c_{\Lambda}\leq\frac{1}{t}$ or not. Clearly, no such difference can be spotted when $t\leq 1$ , since we do not allow for $c_{\Lambda}>1$ . However, when $t>1$ we see that $c_{\Lambda}>\frac{1}{t}$ is perfectly admissible and we interpret this as an analytic sign of the phase transition in $t=1$ .

4.1. Rate functions for the microscopic part

We start by minimizing $I(\Lambda,\alpha;t)$ , for a fixed $\Lambda\in{\mathcal{N}}$ over all compatible $\alpha\in{\mathcal{M}}$ . We will obtain the rate function for the microscopic part, and we will see that this minimum is attained for $\alpha$ of the form $\alpha=\delta_{c_{\alpha}}$ . Informally speaking, the following in particular implies that, with probability tending to one, there is at most one macroscopic particle.

Lemma 4.1 (Analysis of the microscopic rate function).

Fix $\Lambda\in{\mathcal{N}}$ and recall that $c_{\Lambda}=\sum_{k\in\mathbb{N}}k\lambda_{k}\in[0,1]$ , then

[TABLE]

where $I(\Lambda,\alpha;t)$ , $I_{\rm Mi}(\Lambda;t)$ and $I_{\rm Ma}(\alpha;t)$ are defined in the statement of Theorem 1.1.

Proof.

Clearly

[TABLE]

Fix $c\in[0,1]$ and $\alpha\in{\mathcal{M}}_{\mathbb{N}_{0}}(c)$ . Note that $\alpha((c,1])=0$ since $\alpha$ is a point measure with $\int_{(0,1]}x\,\alpha({\rm d}x)=c$ . We have, denoting $f_{t}(x)=\log\frac{x}{1-{\operatorname{e}}^{-tx}}+\frac{t}{2}(1-x)$ ,

[TABLE]

since $f_{t}$ is strictly decreasing in $[0,\infty)$ . Indeed,

[TABLE]

We want to prove that $f^{\prime}(x)<0$ for $x\in[0,\infty)$ . For $y\geq 1$ , this is obvious from above, and for $y\in[0,1)$ , this is easily seen as follows.

[TABLE]

since $\frac{2^{k}}{k!}<2$ for all $k\geq 3$ . Hence, we see that $f_{t}^{\prime}(x)\leq 0$ for $x\in[0,\infty)$ , and (4.1) follows.

Furthermore, when we study $g_{t}(c):=c\log\frac{c}{1-{\operatorname{e}}^{-tc}}+\frac{t}{2}c(1-c)+(1-c_{\Lambda}-c)(\frac{t}{2}-\log t)$ , we see that its derivative is

[TABLE]

which is strictly negative if $\frac{ct{\operatorname{e}}^{-ct}}{(1-{\operatorname{e}}^{-ct})}\neq 1$ . Since $\frac{ct{\operatorname{e}}^{-ct}}{(1-{\operatorname{e}}^{-ct})}<1$ if $ct>0$ , we see that $g_{t}(c)$ is strictly decreasing in $c$ , and hence the optimal value of $c$ is $c=1-c_{\Lambda}$ . ∎

Now the proof of Corollary 1.2 directly follows from Theorem 1.1, Lemma 4.1 and the contraction principle since the projection $(\Lambda,\alpha)\mapsto\Lambda$ is continuous in the product topology. Let us mention that, since $c_{\Lambda}=\sum_{k\in\mathbb{N}}k\lambda_{k}$ , we can rewrite

[TABLE]

with

[TABLE]

with $p_{k}=\frac{1}{t}\frac{k^{k-2}{\operatorname{e}}^{-k}}{k!}$ for all $k\in\mathbb{N}$ . Hence, the term $\widehat{I}(\Lambda)$ is the relative entropy of two non-normalized measures $\Lambda$ and $p:=(p_{k})_{k\in\mathbb{N}}$ . Notice that the reference measure $p$ is such that

[TABLE]

where

[TABLE]

are the probabilities of the Borel distribution with parameter $\mu\in[0,1]$ . The total mass of $p$ is therefore given by

[TABLE]

where $X$ is Borel distributed with parameter $1$ . Now this expectation [AP98, §4.5] is precisely $\frac{1}{2}$ , which explains why we added and subtracted the term $\frac{1}{2t}$ to ${\mathcal{I}}_{\rm Mi}(\Lambda;t)$ in order to obtain the formulation (4.2). As already mentioned in Section 1.4, the above entropy form for the rate function (4.2) strictly relates to the rate function obtained in [BC15, Theorem 1.8] which also takes the form of an entropy with respect to a standard Galton Watson tree, whose total progeny is precisely Borel distributed.

Let us analyse the minimising statistics of the macroscopic part.

Lemma 4.2 (Analysis of the macroscopic rate function).

Fix $\alpha\in{\mathcal{M}}_{\mathbb{N}_{0}}$ and recall that $c_{\alpha}=\int_{(0,1]}x\,\alpha({\rm d}x)\in[0,1]$ , then

[TABLE]

where $C_{\alpha,t}=(1-c_{\alpha})\wedge\frac{1}{t}$ .

*Furthermore, the unique minimizer is equal to $\Lambda^{*}(C_{\alpha,t};t)$ , defined in (1.13). *

Proof.

As in the proof of Lemma 4.1, we see that

[TABLE]

with $\widehat{I}(\Lambda)$ defined in (4.3). Fix $c\in[0,1]$ . Since $\widehat{I}$ is strictly convex on the convex set ${\mathcal{N}}(c)$ , we see by evaluating the variational equations that the only candidate for a minimiser in the interior is

[TABLE]

with $\rho\in\mathbb{R}$ such that $\sum_{k=1}^{\infty}k\lambda^{*}_{k}(c;t)=c$ . Interestingly, we can identify $k\lambda^{*}_{k}(c;t)={\rm Bo}_{\mu_{\rho}}(k)\mu_{\rho}/t$ , where $\mu_{\rho}$ is determined by $\mu_{\rho}-\log\mu_{\rho}=1-\rho$ and ${\rm Bo}_{\mu}$ is defined in (4.4). Note that ${\rm Bo}_{\mu}(k)$ is not summable for $\mu>1$ . Hence, $\rho$ must be picked such that $c=\mu_{\rho}/t$ . The largest value $c$ that can be realised in this way is $c=1/t$ by picking $\rho=0$ . Hence, the preceding is possible at most for $c\in[0,1\wedge\frac{1}{t}]$ . By continuity and strict monotonicity of $\sum_{k=1}^{\infty}k\lambda^{*}_{k}(c;t)$ in $\rho$ , indeed, any $c\in[0,1\wedge\frac{1}{t}]$ can be uniquely realized, by picking $\rho=-tc+\log tc+1\leq 0$ such that $\sum_{k=1}^{\infty}k\lambda^{*}_{k}(c;t)=c$ . In this case, it is clear that the minimizer of $\widehat{I}$ in the interior of ${\mathcal{N}}(c)$ is equal to

[TABLE]

as claimed in (1.13), with value

[TABLE]

where we used that if $X\sim{\rm Bo}_{ct}$ , then $\sum_{k=1}^{\infty}\lambda^{*}_{k}(c;t)=\mathbb{E}\left[\frac{1}{X}\right]=1-\frac{ct}{2}$ , see [AP98, §4.5]. Now we give an argument why $\Lambda^{*}(c;t)$ realises the minimum of $\widehat{I}$ over ${\mathcal{N}}(c)$ . We show that any such minimiser must be positive in every component. Indeed, if ${\lambda}_{k^{*}}=0$ for some $k^{*}\in\mathbb{N}$ , then we consider $\widehat{\Lambda}\in{\mathcal{N}}(c)$ , defined by

[TABLE]

with $\widehat{k}\in\mathbb{N}\setminus\{k^{*}\}$ such that $\lambda_{\widehat{k}}>0$ and $C>0$ such that $\widehat{\Lambda}\in{\mathcal{N}}(c)$ for any sufficiently small $\varepsilon>0$ . Now a simple insertion shows that $\widehat{I}(\widehat{\Lambda})<\widehat{I}(\Lambda)$ , if $\varepsilon>0$ is small enough, since the slope of $\varepsilon\mapsto\varepsilon\log\varepsilon$ at zero is $-\infty$ . Hence, $\Lambda$ cannot be a minimizer. On the other hand, $\Lambda^{*}(c;t)$ has the property that all directional derivatives of $\widehat{I}$ in all admissible directions with compact support are zero; hence it is the minimizer of $\widehat{I}$ over ${\mathcal{N}}(c)$ for $c\in[0,\frac{1}{t}]$ .

When $c>\frac{1}{t}$ , it is possible to pick a sequence of $\Lambda^{{{\scriptscriptstyle{({n}})}}}\in{\mathcal{N}}(c)$ such that $\lim_{n\to\infty}\widehat{I}(\Lambda^{{{\scriptscriptstyle{({n}})}}})=0$ (pick $\lambda_{k}^{{{\scriptscriptstyle{({n}})}}}$ as $\lambda^{*}_{k}(\frac{1}{t};t)+\varepsilon_{n}\delta_{n}(k)$ for some suitable $\varepsilon_{n}>0$ ). Furthermore, since $\widehat{I}(\Lambda)$ is a relative entropy, we know that $\inf_{\Lambda\in{\mathcal{N}}}\widehat{I}(\Lambda)\geq 0.$ Hence, the infimum of $\widehat{I}$ over $\Lambda\in{\mathcal{N}}(c)$ for $c\geq\frac{1}{t}$ is equal to [math]. This shows that the infimum over $\Lambda\in{\mathcal{N}}(c)$ in the last line of (4.6) is equal to $(c\wedge\frac{1}{t})(\log\left(t(c\wedge\frac{1}{t})\right)-\frac{t}{2}(c\wedge\frac{1}{t}))$ , and (4.5) follows. ∎

Then, the proof of Corollary 1.3 directly follows from Theorem 1.1, Lemma 4.2 and the contraction principle, since the projection is continuous.

Finally, let us draw some conclusions regarding the mesoscopic mass. As stated after Corollary 1.4, it is not possible to apply the contraction principle, if we want to derive an LDP for the sequence of random variables $\overline{\rm Me}^{{{\scriptscriptstyle{({N}})}}}_{R_{N},\varepsilon_{N}}(t)$ , however we can still identify the rate function by minimizing $I$ over all pairs $(\Lambda,\alpha)$ such that $c_{\Lambda}+c_{\alpha}=1-c$ . Even if the contraction principle cannot be applied directly, the following lemma proves that the rate function ${\mathcal{J}}_{\rm Me}(c;t)$ has exactly the expected form, given by (4.9).

Lemma 4.3.

Fix $t\in[0,\infty)$ . Then, for any $c\in[0,1]$ and any $R_{N}\in\mathbb{N}$ and $\varepsilon_{N}\in(0,1)$ such that $1\ll R_{N}<\varepsilon_{N}N\ll N$ ,

[TABLE]

Proof.

We first verify that, for a fixed $c\in[0,1]$ ,

[TABLE]

Fix $x\in[0,1-c]$ , then for a fixed $\Lambda\in{\mathcal{N}}(x)$

[TABLE]

since the infimum is attained in $\alpha=\delta_{1-c-x}$ , as proved in Lemma 4.1. Then, with the same procedure of Lemma 4.2, we see that the infimum over $\Lambda\in{\mathcal{N}}(x)$ is attained in

[TABLE]

giving

[TABLE]

Minimizing then for $x\in[0,1-c]$ , we see that the infimum is attained in $x^{*}$ , the smallest solution to

[TABLE]

which is $x^{*}=1-c$ , for all $t\geq\frac{1}{1-c}$ and $x^{*}<1-c$ otherwise. By substituting the optimal $x^{*}$ in (4.10), we see that (4.9) holds.

Now, notice that the procedure to get the upper bound in the proof of Proposition 3.1 implies in a straightforward way that

[TABLE]

In the same way, from the proof of Proposition 3.1, we borrow the strategy of constructing a “recovery sequence”, this time using $\Lambda^{*}(x^{*};t)$ and $\alpha^{*}=\delta_{1-c-x^{*}}$ to construct $\ell^{{{\scriptscriptstyle{({N}})}}}$ as in (3.14). This gives

[TABLE]

∎

The proof of the second point in Corollary 1.4 follows as a direct consequence of Lemma 4.3.

4.2. Proof of Theorem 1.5

Item (1) follows by Lemma 4.1 and 4.2. Following the approach of those proofs, one can easily see that the order of minimization is not important, in particular:

[TABLE]

The minimizer, given a certain microscopic mass $c\in[0,1]$ , is seen to take the form

[TABLE]

where $\Lambda^{*}$ is defined in (1.13) and $\alpha=\delta_{1-c}$ is a single macroscopic component. Imposing a certain microscopic (respectively macroscopic) mass influences the optimal configuration. Indeed, although it is optimal for the system to avoid mesoscopic mass (as seen in Corollary 1.4(2)), the impossibility of minimizing the microscopic configuration under a certain constraint on the mass, namely $c_{\Lambda}>\frac{1}{t}$ , forces the system to actually have a mesoscopic mass of size $c_{\Lambda}-\frac{1}{t}$ . The same happens when we impose a macroscopic mass which is too small, namely $c_{\alpha}<\frac{t-1}{t}$ . The form of the function ${\mathcal{J}}_{\rm Mi}(c;t)$ in (1.12) comes directly from such minimization procedures.

Let us now prove assertion (2). The form of the minimizing $\Lambda$ follows from Lemma 4.2. Fix $t\in[0,1]$ . Then ${\mathcal{J}}_{\rm Mi}(c;t)=c\log c-tc^{2}+tc+(1-c)\log\frac{1-c}{1-{\operatorname{e}}^{t(c-1)}}$ is strictly decreasing in $c\in[0,1]$ . Indeed

[TABLE]

where we introduced the function $F(x)=x-\log x$ , which is decreasing in $x\in(0,1]$ . Hence, monotonicity of ${\mathcal{J}}_{\rm Mi}(\cdot;t)$ in $[0,1]$ follows from

[TABLE]

The first inequality follows by observing that the function $\phi_{t}(c)={\operatorname{e}}^{-t(1-c)}-c$ is nonnegative for all $c\in[0,1\wedge\frac{1}{t}]$ , since $\phi_{t}(0)={\operatorname{e}}^{-t}>0$ , $\phi_{t}(1)=0$ , and $\phi_{t}$ is strictly decreasing in $[0,1]$ , since $t\leq 1$ . The second inequality follows from the fact that $\psi(z):=1-{\operatorname{e}}^{-z}-z{\operatorname{e}}^{-z}\geq 0$ for all $z\in[0,1]$ (substitute $z=t(1-c)$ ), since $\psi(0)=0$ , $\psi(1)=1-2{\operatorname{e}}^{-1}\geq 0$ and $\psi$ is strictly increasing in $[0,1]$ . Therefore, ${\mathcal{J}}_{\rm Mi}(\cdot;t)$ is minimized in $c=1$ , which implies the conclusion.

Now we turn to assertion (3). For $t\in(1,\infty)$ , the derivative of ${\mathcal{J}}_{\rm Mi}(c;t)$ writes as follows

[TABLE]

It is clear that ${\mathcal{J}}_{\rm Mi}(c;t)$ is strictly increasing in $c\in(\frac{1}{t},1]$ , while for $c\in[0,\frac{1}{t}]$ , we need to go back to (4.11). The right inequality there is still true for any $c<\frac{1}{t}$ . Since the quotient in (4.11) is strictly increasing in $c$ and since $F(x)=x-\lg x$ is strictly convex in $x$ , the unique zero of $\frac{{\rm d}}{{\rm d}c}{\mathcal{J}}_{\rm Mi}(c;t)$ is given by the unique solution $c$ of

[TABLE]

which is precisely the solution $c=\beta_{t}$ of (1.15). The remaining assertions follow.

Acknowledgements

The authors acknowledge three anonymous referees for their careful reviews and many suggestions for improving the exposition. This research has been funded by the Deutsche Forschungsgemeinschaft (DFG) through grant CRC 1114 “Scaling Cascades in Complex Systems”, Project C08.

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Ada 08] S. Adams. Large deviations for empirical cycle counts of integer partitions and their relation to systems of Bosons. In Analysis and stochastics of growth processes and interface models , pages 148–172. Oxford Univ. Press, Oxford, 2008.
2[Ald 97] D. J. Aldous. Brownian excursions, critical random graphs and the multiplicative coalescent. Ann. Probab. , 25(2):812–854, 1997.
3[Ald 99] D. J. Aldous. Deterministic and stochastic models for coalescence (aggregation and coagulation): a review of the mean-field theory for probabilists. Bernoulli , 5(1):3–48, 02 1999.
4[AP 98] D. J. Aldous and J. Pitman. Tree-valued Markov chains derived from Galton-Watson processes. Annales de l’Institut Henri Poincare (B) Probability and Statistics , 34(5):637 – 686, 1998.
5[Aug 18] F. Augeri. Nonlinear large deviation bounds with applications to traces of Wigner matrices and cycles counts in Erdős-Rényi graphs. arxiv preprint 1810.01558 , 2018.
6[BC 15] C. Bordenave and P. Caputo. Large deviations of empirical neighborhood distribution in sparse random graphs. Probability Theory and Related Fields , 163(1-2):149–222, 2015.
7[BJR 07] B. Bollobás, S. Janson, and O. Riordan. The phase transition in inhomogeneous random graphs. Random Structures & Algorithms , 31(1):3–122, 2007.
8[Bol 01] B. Bollobás. Random Graphs . Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2 edition, 2001.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A large-deviations principle

Abstract.

1. Introduction

1.1. Micro- and macroscopic empirical measures

1.2. Our results: large-deviations principles

Theorem 1.1** (LDP for the empirical measures).**

Corollary 1.2** (LDP for microscopic component size statistics).**

Corollary 1.3** (LDP for macroscopic particles).**

Corollary 1.4** (LDP for mesoscopic mass).**

1.3. Our results: the phase transition in the light of the LDP

Theorem 1.5** (Microscopic total mass phase transition).**

1.4. Related works on LDPs for Erdős-Rényi graphs

1.5. Application to coagulation models

1.6. Comparison to Bose-Einstein condensation without interaction

2. Preparations for the proof of the LDP

2.1. The joint distribution of the component sizes

Lemma 2.1**.**

Proof.

Corollary 2.2**.**

Proof.

2.2. The probability of being connected

Lemma 2.3** (Bounds and asymptotics for μk(1NtN)\mu_{k}(\frac{1}{N}t_{N})μk​(N1​tN​), [Ste70]).**

Lemma 2.4** ([Ste70]).**

3. Proof of the LDP

Proposition 3.1**.**

Lemma 3.2**.**

Proof.

Lemma 3.3**.**

Proof.

Lemma 3.4**.**

Proof.

Lemma 3.5**.**

Proof.

Lemma 3.6**.**

Proof.

Lemma 3.7**.**

Proof.

Lemma 3.8**.**

Proof.

Lemma 3.9**.**

Proof.

Lemma 3.10**.**

Proof.

Lemma 3.11**.**

Proof.

Proof of Proposition 3.1.

4. Corollaries and study of the rate functions

4.1. Rate functions for the microscopic part

Lemma 4.1** (Analysis of the microscopic rate function).**

Proof.

Lemma 4.2** (Analysis of the macroscopic rate function).**

Proof.

Lemma 4.3**.**

Proof.

4.2. Proof of Theorem 1.5

Acknowledgements

Theorem 1.1 (LDP for the empirical measures).

Corollary 1.2 (LDP for microscopic component size statistics).

Corollary 1.3 (LDP for macroscopic particles).

Corollary 1.4 (LDP for mesoscopic mass).

Theorem 1.5 (Microscopic total mass phase transition).

Lemma 2.1.

Corollary 2.2.

Lemma 2.3 (Bounds and asymptotics for $\mu_{k}(\frac{1}{N}t_{N})$ , [Ste70]).

Lemma 2.4 ([Ste70]).

Proposition 3.1.

Lemma 3.2.

Lemma 3.3.

Lemma 3.4.

Lemma 3.5.

Lemma 3.6.

Lemma 3.7.

Lemma 3.8.

Lemma 3.9.

Lemma 3.10.

Lemma 3.11.

Lemma 4.1 (Analysis of the microscopic rate function).

Lemma 4.2 (Analysis of the macroscopic rate function).

Lemma 4.3.