On the computational tractability of statistical estimation on amenable   graphs

Ahmed El Alaoui; Andrea Montanari

arXiv:1904.03313·math.PR·September 24, 2019

On the computational tractability of statistical estimation on amenable graphs

Ahmed El Alaoui, Andrea Montanari

PDF

TL;DR

This paper investigates how the structure of graphs influences the gap between statistically optimal and computationally feasible solutions in estimating discrete variables, showing that amenable graphs allow near-optimal local algorithms, unlike random graphs.

Contribution

It demonstrates that for amenable graphs, simple local algorithms can nearly achieve optimal estimation, contrasting with the persistent gap in random graphs.

Findings

01

Local algorithms achieve near-optimal accuracy on amenable graphs.

02

The information-computation gap persists in random regular graphs.

03

Graph structure critically affects the computational-statistical tradeoff.

Abstract

We consider the problem of estimating a vector of discrete variables $(θ_{1}, \dots, θ_{n})$ , based on noisy observations $Y_{uv}$ of the pairs $(θ_{u}, θ_{v})$ on the edges of a graph $G = ([n], E)$ . This setting comprises a broad family of statistical estimation problems, including group synchronization on graphs, community detection, and low-rank matrix estimation. A large body of theoretical work has established sharp thresholds for weak and exact recovery, and sharp characterizations of the optimal reconstruction accuracy in such models, focusing however on the special case of Erd\"os--R\'enyi-type random graphs. The single most important finding of this line of work is the ubiquity of an information-computation gap. Namely, for many models of interest, a large gap is found between the optimal accuracy achievable by any statistical method, and the optimal accuracy achieved…

Equations440

Y_{uv} = {θ_{u} - θ_{v} (mod q) w_{uv} \mbox w i t h p r o babi l i t y 1 - p, \mbox w i t h p r o babi l i t y p,

Y_{uv} = {θ_{u} - θ_{v} (mod q) w_{uv} \mbox w i t h p r o babi l i t y 1 - p, \mbox w i t h p r o babi l i t y p,

ξ_{u}^{(ε)} = {θ_{u} ⋆ \mbox w i t h p r o babi l i t y ε, \mbox w i t h p r o babi l i t y 1 - ε,

ξ_{u}^{(ε)} = {θ_{u} ⋆ \mbox w i t h p r o babi l i t y ε, \mbox w i t h p r o babi l i t y 1 - ε,

(X_{f})_{u, v} := f (θ_{u}) f (θ_{v}), u, v \in V_{n},

(X_{f})_{u, v} := f (θ_{u}) f (θ_{v}), u, v \in V_{n},

\mathcal{R}_{n}(\widehat{{\bm{X}}};f):=\frac{1}{n^{2}}\operatorname{\mathbb{E}}\Big{[}\big{\|}{\bm{X}}_{f}-\widehat{{\bm{X}}}\big{\|}_{F}^{2}\Big{]}\,.

\mathcal{R}_{n}(\widehat{{\bm{X}}};f):=\frac{1}{n^{2}}\operatorname{\mathbb{E}}\Big{[}\big{\|}{\bm{X}}_{f}-\widehat{{\bm{X}}}\big{\|}_{F}^{2}\Big{]}\,.

\widehat{{\bm{X}}}^{\text{Bayes}}:=\Big{(}\operatorname{\mathbb{E}}\Big{[}f(\theta_{u})f(\theta_{v})|Y_{G_{n}}^{(\varepsilon)}\Big{]}\Big{)}_{u,v\in V_{n}}.

\widehat{{\bm{X}}}^{\text{Bayes}}:=\Big{(}\operatorname{\mathbb{E}}\Big{[}f(\theta_{u})f(\theta_{v})|Y_{G_{n}}^{(\varepsilon)}\Big{]}\Big{)}_{u,v\in V_{n}}.

\lim_{l\to\infty}\lim_{n\to\infty}\big{\{}\mathcal{R}_{n}(\widehat{{\bm{X}}}^{(l)};f)-\mathcal{R}_{n}^{\textup{Bayes}}(f)\big{\}}=0.

\lim_{l\to\infty}\lim_{n\to\infty}\big{\{}\mathcal{R}_{n}(\widehat{{\bm{X}}}^{(l)};f)-\mathcal{R}_{n}^{\textup{Bayes}}(f)\big{\}}=0.

Y_{uv} \sim Q (\cdot ∣ θ_{u}, θ_{v}) = N (θ_{u} θ_{v}; σ_{n}^{2}),

Y_{uv} \sim Q (\cdot ∣ θ_{u}, θ_{v}) = N (θ_{u} θ_{v}; σ_{n}^{2}),

\operatorname{\mathbb{E}}_{\rho}\Big{[}\sum_{u\in V(G)}f(G,o,u)\Big{]}=\operatorname{\mathbb{E}}_{\rho}\Big{[}\sum_{u\in V(G)}f(G,u,o)\Big{]},

\operatorname{\mathbb{E}}_{\rho}\Big{[}\sum_{u\in V(G)}f(G,o,u)\Big{]}=\operatorname{\mathbb{E}}_{\rho}\Big{[}\sum_{u\in V(G)}f(G,u,o)\Big{]},

\inf\Big{\{}|\partial S|/|S|:S\subset V\textup{ finite},o\in S\Big{\}}=0.

\inf\Big{\{}|\partial S|/|S|:S\subset V\textup{ finite},o\in S\Big{\}}=0.

\limsup_{k\to\infty}\rho\Big{(}(G,o)~{}:\sum_{u\in V(G)}\frac{{\mathbf{1}}_{o\in S_{k}(G,u)}}{|S_{k}(G,u)|}\leq\delta\Big{)}\leq\eta.

\limsup_{k\to\infty}\rho\Big{(}(G,o)~{}:\sum_{u\in V(G)}\frac{{\mathbf{1}}_{o\in S_{k}(G,u)}}{|S_{k}(G,u)|}\leq\delta\Big{)}\leq\eta.

α_{k} (G, o) := u \in V (G) \sum \frac{1 _{o \in S_{k} (G, u)}}{∣ S _{k} ( G , u ) ∣} .

α_{k} (G, o) := u \in V (G) \sum \frac{1 _{o \in S_{k} (G, u)}}{∣ S _{k} ( G , u ) ∣} .

E_{ρ} [α_{k} (G, o)]

E_{ρ} [α_{k} (G, o)]

= E_{ρ} u \in V (G) \sum 1_{u \in S_{k} (G, o)} \frac{1}{∣ S _{k} ( G , o ) ∣} = E_{ρ} [\frac{∣ S _{k} ( G , o ) ∣}{∣ S _{k} ( G , o ) ∣}] = 1 .

α_{k} (G, o)

α_{k} (G, o)

\geq \frac{1}{c _{2} ℓ _{k}^{d}} x \in V (G) \sum 1_{x \in S_{k} (G, o)} = \frac{1}{c _{2} ℓ _{k}^{d}} ∣ S_{k} (G, o) ∣ .

\mu_{G_{n},u}(x):=\operatorname{\mathbb{P}}\Big{(}\theta_{u}=x\big{|}Y^{(\varepsilon)}_{G_{n}}\Big{)},~{}~{}\mbox{for }u\in V_{n}\mbox{ and }x\in{\mathcal{X}}.

\mu_{G_{n},u}(x):=\operatorname{\mathbb{P}}\Big{(}\theta_{u}=x\big{|}Y^{(\varepsilon)}_{G_{n}}\Big{)},~{}~{}\mbox{for }u\in V_{n}\mbox{ and }x\in{\mathcal{X}}.

\widehat{\mu}_{G_{n},u,l}(x):=\operatorname{\mathbb{P}}\Big{(}\theta_{u}=x\big{|}Y^{(\varepsilon)}_{B_{G_{n}}(u,l)}\Big{)}.

\widehat{\mu}_{G_{n},u,l}(x):=\operatorname{\mathbb{P}}\Big{(}\theta_{u}=x\big{|}Y^{(\varepsilon)}_{B_{G_{n}}(u,l)}\Big{)}.

M \to \infty lim n \to \infty lim P (∣ B_{G_{n}} (o_{n}, l) ∣ \geq M) = 0.

M \to \infty lim n \to \infty lim P (∣ B_{G_{n}} (o_{n}, l) ∣ \geq M) = 0.

\lim_{l\to\infty}\lim_{n\to\infty}\frac{1}{n}\sum_{u\in V_{n}}\operatorname{\mathbb{E}}\big{[}d_{\mbox{\rm\tiny TV}}(\widehat{\mu}_{G_{n},u,l},\mu_{G_{n},u})\big{]}=0.

\lim_{l\to\infty}\lim_{n\to\infty}\frac{1}{n}\sum_{u\in V_{n}}\operatorname{\mathbb{E}}\big{[}d_{\mbox{\rm\tiny TV}}(\widehat{\mu}_{G_{n},u,l},\mu_{G_{n},u})\big{]}=0.

\mu_{G,o}(x):=\operatorname{\mathbb{P}}\Big{(}\theta_{o}=x\big{|}(G,o),Y^{(\varepsilon)}_{G}\Big{)},

\mu_{G,o}(x):=\operatorname{\mathbb{P}}\Big{(}\theta_{o}=x\big{|}(G,o),Y^{(\varepsilon)}_{G}\Big{)},

\displaystyle\lim_{l\to\infty}\lim_{n\to\infty}\frac{1}{n}\sum_{u\in V_{n}}\operatorname{\mathbb{E}}\big{[}\widehat{\mu}_{G_{n},u,l}^{2}(x)\big{]}

\displaystyle\lim_{l\to\infty}\lim_{n\to\infty}\frac{1}{n}\sum_{u\in V_{n}}\operatorname{\mathbb{E}}\big{[}\widehat{\mu}_{G_{n},u,l}^{2}(x)\big{]}

\displaystyle\mbox{and}\quad\lim_{n\to\infty}\frac{1}{n}\sum_{u\in V_{n}}\operatorname{\mathbb{E}}\big{[}\mu_{G_{n},u}^{2}(x)\big{]}

\displaystyle\operatorname{\mathbb{E}}\big{[}\widehat{\mu}_{G_{n},u,l}(x)\mu_{G_{n},u}(x)\big{]}

\displaystyle\operatorname{\mathbb{E}}\big{[}\widehat{\mu}_{G_{n},u,l}(x)\mu_{G_{n},u}(x)\big{]}

\displaystyle=\operatorname{\mathbb{E}}\big{[}\operatorname{\mathbb{P}}\big{(}\theta_{u}=x|Y^{(\varepsilon)}_{B_{G_{n}}(u,l)}\big{)}^{2}\big{]}=\operatorname{\mathbb{E}}\big{[}\widehat{\mu}_{G_{n},u,l}(x)^{2}\big{]}.

\displaystyle\operatorname{\mathbb{E}}\big{[}d_{\mbox{\rm\tiny TV}}(\widehat{\mu}_{G_{n},u,l},\mu_{G_{n},u})\big{]}^{2}

\displaystyle\operatorname{\mathbb{E}}\big{[}d_{\mbox{\rm\tiny TV}}(\widehat{\mu}_{G_{n},u,l},\mu_{G_{n},u})\big{]}^{2}

\displaystyle=\frac{1}{4}|{\mathcal{X}}|\sum_{x\in{\mathcal{X}}}\left(\operatorname{\mathbb{E}}\big{[}\operatorname{\mathbb{E}}\big{[}\mu_{G_{n},u}(x)^{2}\big{]}-\widehat{\mu}_{G_{n},u,l}(x)^{2}\big{]}\right)\,.

X_{uv}^{(\mbox dec)}

X_{uv}^{(\mbox dec)}

\displaystyle=\Big{(}\sum_{x\in{\mathcal{X}}}\mu_{G_{n},u}(x)f(x)\Big{)}\cdot\Big{(}\sum_{x\in{\mathcal{X}}}\mu_{G_{n},v}(x)f(x)\Big{)},~{}~{}~{}u,v\in V_{n}.

\lim_{n\to\infty}\big{\{}\mathcal{R}_{n}(\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})};f)-\mathcal{R}_{n}^{\textup{Bayes}}(f)\big{\}}=0\,.

\lim_{n\to\infty}\big{\{}\mathcal{R}_{n}(\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})};f)-\mathcal{R}_{n}^{\textup{Bayes}}(f)\big{\}}=0\,.

\displaystyle\widehat{X}^{(l)}_{uv}:=\Big{(}\sum_{x\in{\mathcal{X}}}\widehat{\mu}_{G_{n},u,l}(x)f(x)\Big{)}\cdot\Big{(}\sum_{x\in{\mathcal{X}}}\widehat{\mu}_{G_{n},v,l}(x)f(x)\Big{)}\,.

\displaystyle\widehat{X}^{(l)}_{uv}:=\Big{(}\sum_{x\in{\mathcal{X}}}\widehat{\mu}_{G_{n},u,l}(x)f(x)\Big{)}\cdot\Big{(}\sum_{x\in{\mathcal{X}}}\widehat{\mu}_{G_{n},v,l}(x)f(x)\Big{)}\,.

\mathcal{R}_{n}(\widehat{{\bm{X}}}^{(l)};f)-\mathcal{R}_{n}(\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})};f)=-\frac{2}{n^{2}}\operatorname{\mathbb{E}}\big{\langle}\widehat{{\bm{X}}}^{(l)}-\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})},{\bm{X}}_{f}\big{\rangle}+\frac{1}{n^{2}}\big{(}\operatorname{\mathbb{E}}\|\widehat{{\bm{X}}}^{(l)}\|_{F}^{2}-\operatorname{\mathbb{E}}\|\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})}\|_{F}^{2}\big{)}.

\mathcal{R}_{n}(\widehat{{\bm{X}}}^{(l)};f)-\mathcal{R}_{n}(\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})};f)=-\frac{2}{n^{2}}\operatorname{\mathbb{E}}\big{\langle}\widehat{{\bm{X}}}^{(l)}-\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})},{\bm{X}}_{f}\big{\rangle}+\frac{1}{n^{2}}\big{(}\operatorname{\mathbb{E}}\|\widehat{{\bm{X}}}^{(l)}\|_{F}^{2}-\operatorname{\mathbb{E}}\|\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})}\|_{F}^{2}\big{)}.

\displaystyle\operatorname{\mathbb{E}}\big{\langle}\widehat{{\bm{X}}}^{(l)}-\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})},{\bm{X}}_{f}\big{\rangle}

\displaystyle\operatorname{\mathbb{E}}\big{\langle}\widehat{{\bm{X}}}^{(l)}-\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})},{\bm{X}}_{f}\big{\rangle}

\displaystyle\hskip 56.9055pt-\operatorname{\mathbb{E}}\big{[}f(\theta_{u})|Y^{(\varepsilon)}_{G_{n}}\big{]}\operatorname{\mathbb{E}}\big{[}f(\theta_{v})|Y^{(\varepsilon)}_{G_{n}}\big{]}\Big{)}f(\theta_{u})f(\theta_{v})\Big{]}.

∥ f ∥_{\infty}^{2} u, v \in V_{n} \sum

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On the computational tractability of statistical estimation

on amenable graphs

Ahmed El Alaoui* and Andrea Montanari Department of Electrical Engineering and Department of Statistics, Stanford University

Abstract

We consider the problem of estimating a vector of discrete variables ${\bm{\theta}}=(\theta_{1},\cdots,\theta_{n})$ , based on noisy observations $Y_{uv}$ of the pairs $(\theta_{u},\theta_{v})$ on the edges of a graph $G=([n],E)$ . This setting comprises a broad family of statistical estimation problems, including group synchronization on graphs, community detection, and low-rank matrix estimation.

A large body of theoretical work has established sharp thresholds for weak and exact recovery, and sharp characterizations of the optimal reconstruction accuracy in such models, focusing however on the special case of Erdös–Rényi-type random graphs. The single most important finding of this line of work is the ubiquity of an information-computation gap. Namely, for many models of interest, a large gap is found between the optimal accuracy achievable by any statistical method, and the optimal accuracy achieved by known polynomial-time algorithms. Moreover, this gap is generally believed to be robust to small amounts of additional side information revealed about the $\theta_{i}$ ’s.

How does the structure of the graph $G$ affect this picture? Is the information-computation gap a general phenomenon or does it only apply to specific families of graphs?

We prove that the picture is dramatically different for graph sequences converging to amenable graphs (including, for instance, $d$ -dimensional grids). We consider a model in which an arbitrarily small fraction of the vertex labels is revealed, and show that a linear-time local algorithm can achieve reconstruction accuracy that is arbitrarily close to the information-theoretic optimum. We contrast this to the case of random graphs. Indeed, focusing on group synchronization on random regular graphs, we prove that the information-computation gap still persists even when a small amount of side information is revealed.

1 Introduction

Classical statistics focuses on problems in which a small number of parameters needs to be estimated from data. As a consequence, it is mostly unconcerned with computational complexity considerations. Fundamental limits to statistical estimation are proven on the basis of information-theoretic considerations. On the contrary, in modern high-dimensional applications, it is not uncommon to come across statistical models that require estimating simultaneously thousands or even millions of parameters. In this setting, a large gap is often observed between information-theoretic limits and what is achieved by the best known polynomial-time algorithms. Indeed, it is expected that no polynomial-time algorithm can achieve optimal statistical performance in general. In specific classes of models, a precise information-computation gap has been conjectured on the basis of current knowledge (see, e.g., [MM09, DKMZ11, MR14, LM17, BKM*+*19, CM19] and references therein).

As explained below, most of our understanding of this information-computation gap was developed by analyzing probabilistic models with a high degree of exchangeability. This suggests a natural question: Is the same gap present in models with other type of structures?

Statistical estimation on graphs provides a rich and interesting setting to study this question. Let $G_{n}=(V_{n},E_{n})$ be a graph on $n$ vertices, $V_{n}=[n]$ . Edges are assumed to be directed in an arbitrary way, i.e., they are ordered pairs $(u,v)\in V_{n}\times V_{n}$ . We associate to the vertices $u\in V_{n}$ random variables ${\bm{\theta}}=(\theta_{u})_{u\in V_{n}}\sim_{\text{iid}}{\sf Unif}({\mathcal{X}})$ , uniformly distributed on a finite alphabet ${\mathcal{X}}$ . For each edge $(u,v)\in E_{n}$ , we observe $Y_{uv}\in{\mathcal{Y}}$ , where ${\mathcal{Y}}$ is also a finite alphabet. The observations are conditionally independent with $Y_{uv}|{\bm{\theta}}\sim Q(\,\cdot\,|\theta_{u},\theta_{v})$ , where $Q$ is a probability kernel from ${\mathcal{X}}\times{\mathcal{X}}$ to ${\mathcal{Y}}$ . Given the edge observations ${Y}$ (and, possibly, additional side information, see below), the purpose is to estimate the vertex assignment ${\bm{\theta}}$ .

This model is general enough to include a broad variety of examples studied in the literature, including group synchronization, community detection, low-rank matrix estimation, and so on. As an example consider the $\mathbb{Z}_{q}$ –synchronization problem (further examples are presented in Section 3.1). The unknown variables $(\theta_{u})_{u\in V_{n}}$ are i.i.d. uniform in ${\mathcal{X}}=\mathbb{Z}_{q}=\{0,\cdots,q-1\}$ , which we identify with the cyclic group $\mathbb{Z}/q\mathbb{Z}$ with additive structure. Observations are noisy measurements of the difference between $\theta_{u}$ and $\theta_{v}$ for each edge $(u,v)\in E_{n}$ :

[TABLE]

where $(w_{uv})_{(u,v)\in E_{n}}$ is a collection of independent random variables $w_{uv}\sim{\sf Unif}({\mathcal{X}})$ , independent of $(\theta_{u})_{u\in V_{n}}$ .

In addition to the observations ${Y}$ , we consider independent observations $(\xi^{(\varepsilon)}_{u})_{u\in V_{n}}$ on the vertices of $G_{n}$ :

[TABLE]

where $\star$ is a symbol not belonging to ${\mathcal{X}}$ , so that with probability $\varepsilon$ the value of $\theta_{u}$ is directly observed. We will write ${\mathcal{X}}_{\star}:={\mathcal{X}}\cup\{\star\}$ . Following the information theory literature, we refer to this noise model as the Binary Erasure Channel, and denote it by BEC( $\bar{\varepsilon}$ ). (It is customary to parametrize the BEC by its erasure probability $\bar{\varepsilon}=1-\varepsilon$ .) The parameter $\varepsilon$ will be considered very small (eventually going to zero as $n$ becomes large). The purpose of this side information is to break the occasional group symmetry (sign symmetry or cyclic shifts in the case of $\mathbb{Z}_{q}$ ) that would otherwise be preserved by the observations $Y$ .

We consider two metrics for the estimation accuracy. In our first definition, the goal is to estimate the $n\times n$ rank-one matrix ${\bm{X}}_{f}$ whose entries are

[TABLE]

where $f:{\mathcal{X}}\mapsto\mathbb{R}$ is a given real-valued function. For instance by setting $f(\theta)={\mathbf{1}}_{\theta=x}$ and then considering all values of $x\in{\mathcal{X}}$ , this allows to estimate whether $\theta_{u}=\theta_{v}$ for each pair of vertices $u,v\in V_{n}$ . An estimator is a map $\widehat{{\bm{X}}}:{\mathcal{Y}}^{E_{n}}\times{\mathcal{X}}_{\star}^{V_{n}}\to{\mathbb{R}}^{n\times n}$ , i.e., a function of the observations ${Y}$ and the side information $\xi^{(\varepsilon)}$ . We evaluate its risk under the square loss

[TABLE]

We denote by $\mathcal{R}_{n}^{\text{Bayes}}(f)$ the minimal achievable error, i.e., the one achieved by the posterior expectation

[TABLE]

(We have made use of the following notation: for a graph $G=(V,E)$ , we denote by $Y^{(\varepsilon)}_{G}$ the union of the vertex and edge observations over $G$ : $Y^{(\varepsilon)}_{G}=\{Y_{uv}:(u,v)\in E,\xi^{(\varepsilon)}_{u}:u\in V\}$ .) Our second metric for estimation accuracy is the ‘overlap’, and will be introduced in Section 4, see Eq. (4.7).

Statistical estimation on graphs has motivated substantial amount of work. In this context, the first example of a statistical model with a large information-computation gap is probably the planted clique problem [Jer92, AKV02]. This can be recast in the general framework described above, with $G_{n}$ the complete graph over $n$ vertices (see Section 3.1). Despite more than a quarter century of research, and the study of increasingly powerful classes of algorithms [FK00, DM13, BHK*+*16], no known polynomial-time algorithm comes close to saturate the information-theoretic limits for this problem.

In recent years, a much more refined picture of the information-computation gap has emerged, mainly through the careful analysis of a variety of models on sparse random graphs (as well as models on dense graphs in a different noise regime than the hidden clique model). We refer to Section 2 for a brief summary of this vast literature. In most of these models an information-computation gap is observed, and has been precisely delineated. This gap is generally conjectured to remain unchanged if a small amount of side information is revealed111The careful reader will notice that this statement does not apply to the planted clique problem. If the label of $\varepsilon n$ random vertices is revealed (i.e., whether or not they belong to the clique), then it is easy to find planted cliques of size $k\gg(1/\varepsilon)\log n$ , i.e., far below the best known polynomial algorithms for $\varepsilon=0$ . This behavior is however related to the fact that, in the planted clique problem, the labels’ prior distribution is strongly dependent on $n$ , as revealed from the fact that the clique’s size is sublinear in $n$ ., as in Eq. (1.2). As mentioned above, most of the theoretical work has focused however on random graphs (Erdös–Rényi random graphs, random regular graphs and their relatives). This motivates the following key question:

Does an information-computation gap exist for statistical estimation on other types of graphs?

In this paper, we consider the case of graph sequences that converge locally to amenable graphs. Roughly, these are graphs for which the boundary of large sets of vertices is negligible compared to their volume. We refer to Section 3 for a reminder on the relevant definitions. Our results are already interesting for the simplest example of such graphs, namely large boxes $[1,L]\times\cdots\times[1,L]$ in the $d$ -dimensional grid ${\mathbb{Z}}^{d}$ (with $L=n^{1/d}$ ).

Our main finding is that no information-computation gap exists for such graphs (as long as the gap is defined in terms of polynomial- versus non-polynomial time algorithms). A specific formalization of this finding is given below, and proved in Section 4.

Theorem A.

Let $f:{\mathcal{X}}\to{\mathbb{R}}$ be a function with $\operatorname{\mathbb{E}}[f(\theta)]=0$ for $\theta\sim{\sf Unif}({\mathcal{X}})$ . Let $G_{n}=(V_{n},E_{n})$ be a sequence of finite graphs (with $|V_{n}|=n$ ) converging locally–weakly to a random rooted graph $(G,o)$ which is infinite, locally-finite, almost surely anchored–amenable and tame. Then for each $l\in{\mathbb{N}}$ there exists an estimator $\widehat{{\bm{X}}}^{(l)}:{\mathcal{Y}}^{E_{n}}\times{\mathcal{X}}_{\star}^{V_{n}}\to{\mathbb{R}}^{n\times n}$ , with runtime $\mathcal{O}(n^{2})$ , such that the following holds. For almost every $\varepsilon>0$ , we have

[TABLE]

The notions of local–weak convergence, anchored–amenability and tameness will be defined in Section 3. More in detail, we present the following contributions:

No information-computation gap on amenable graphs.

Theorem A provides a concrete formalization of the general statement that statistically optimal estimation can be performed using polynomial time algorithms on (asymptotically) amenable graphs. In fact, we will prove that this follows from a more fundamental result, establishing that the vertex marginals of the posterior ${\mathbb{P}}\big{(}{\bm{\theta}}\;|Y^{(\varepsilon)}_{G_{n}}\big{)}$ can be computed to arbitrary accuracy in polynomial time, for almost all values of $\varepsilon$ , on asymptotically amenable graphs, cf. Section 4.

Note that approximating the Bayes estimator $\widehat{{\bm{X}}}^{\text{Bayes}}$ , Eq. (1.5), requires to approximate the joint distribution of pairs of well separated vertices. However, we will use a decoupling argument to reduce ourselves to the case of vertex marginals.

Local algorithms.

Our proof that vertex marginals can be computed efficiently follows from an even stronger, and somewhat surprising fact (as above, holding for almost all $\varepsilon$ ). The marginal at a vertex $v$ can be well approximated by computing the marginal with respect to the posterior given observations in a large constant-size ball centered at $v$ . In other words, the marginal can be approximated by a local algorithm. The reason for this phenomenon can be explained in information theoretic terms. We will prove that the average conditional mutual information between a random vertex in a region $S\subseteq V$ , and the boundary of $S$ , $I\big{(}\theta_{v};\theta_{\partial S}|Y_{S}^{(\varepsilon)}\big{)}$ is upper bounded by $|\partial S|/|S|$ . Hence, for amenable graphs, the effect of the boundary information is generally negligible.

Robust information-computation gap on random regular graphs.

We provide a counter-example, by showing that the conclusions at the previous points do not hold for random regular graphs, converging locally to $k$ -regular trees, which are non-amenable. As mentioned above, several cases of statistical estimation problems have been observed to present an information-computation gap, when the underlying graph is random. While this gap is often expected to be robust to side information about the vertices, we are not aware of any result that explicitly establishes robustness—in the setting of the present paper. We consider the ${\mathbb{Z}}_{q}$ –synchronization problem on random $k$ -regular graphs. We prove that, for a large range of the model parameters and all $\varepsilon$ small enough: $(i)$ There exists a statistical estimator that achieves non-trivial reconstruction accuracy uniformly as $\varepsilon\to 0$ ; $(ii)$ Local algorithms can only achieve accuracy that vanishes as $\varepsilon\to 0$ .

2 Related literature

As mentioned in the introduction, large information computation gaps were observed in a number of statistical estimation problems, when the underlying structure is a random graph, the complete graph, or close relatives. An incomplete list includes community detection in the stochastic block model [DKMZ11, Mas14, MNS18, Abb17], high-dimensional linear regression and generalized linear models [BKM*+*19, CM19], low-rank matrix estimation and sparse principal component analysis [JL09, AW09, BR13, MW15, LM17], tensor principal component analysis [MR14, HSS15, HKP*+*17], tensor decomposition, and so on.

In many of these models, two types of results are established. On one hand an ‘information-theoretic’ analysis allows to characterize the optimal statistical accuracy that is achieved by an ideal estimator. On the other, specific classes of polynomial-time algorithms are analyzed. Sometimes the resulting statistical estimation limits are stated in terms of specific goals such as ‘weak recovery’ or ‘exact recovery’: in the present paper we consider the general goal of estimation with certain expected accuracy, or risk.

The most frequently analyzed classes of algorithms have been spectral methods, local algorithms, and convex relaxations in the sum-of-squares hierarchy. A remarkable dichotomy has emerged from these works. Roughly speaking, in all the examples we know of, either highly sophisticated semidefinite programming hierarchies fail, or simple combinations of spectral methods and local algorithms succeed. The behavior of the latter is in turn characterized by studying the Bayes optimal local algorithm (belief propagation), in the presence of a small amount of side information. Partial rationalizations of this surprising dichotomy were given in [HSSS16, HKP*+*17, FM17]. Motivated by this work, our analysis of ${\mathbb{Z}}_{q}$ –synchronization on random regular graphs (Section 5) will focus on the same simple algorithm: belief propagation in the presence of side information. As common in the literature, we will use the weak recovery threshold for this algorithm as a proxy for the fundamental algorithmic threshold.

Let us stress that our main focus is statistical estimation on amenable graphs. Versions of this problem have been studied in a few recent papers [AMM*+*17, SB18, PW18, AB18, ABRS18]. In particular, [AMM*+*17] proved the existence of a weak recovery threshold for ${\mathbb{Z}}_{q}$ –synchronization222For $d=2$ , [AMM*+*17] proves that a threshold exists in the case $q=2$ , and indeed the same is expected to hold for $q\geq 3$ as well. For $d=1$ no non-trivial threshold exists in that weak recovery is always impossible. on grids in $d\geq 3$ dimensions. However, in contrast with random graphs, no explicit characterization exists (or is likely to exist) for the optimal statistical accuracy nor, in general, for the location of weak recovery thresholds. This poses a clear challenge to us: we want to prove that the optimal statistical accuracy can be achieved by polynomial time algorithms, but we do not have an explicit characterization for the target accuracy. Indeed, our proof will be purely conceptual.

Let us finally mention that it is well understood that certain algorithmic tasks are easy on graphs that can be embedded well in ${\mathbb{R}}^{d}$ (e.g., on grids). For instance, approximate optimization of a function that decomposes as a sum of edge terms over a grid is easy, by partitioning the grid into large boxes. Unfortunately, these ideas do not have direct implications on the questions addressed in this paper. Even if we can find an approximate-maximum likelihood assignment of the unknown variables $\theta_{i}$ , this is not guaranteed to have any good statistical properties, let alone achieve optimal estimation error. Inference and estimation do not reduce to optimization.

3 Background

3.1 Further examples

It is interesting to check that the framework defined in the introduction is broad enough to encompass a variety of models of interest.

Spiked Wigner and Wishart models. Low-rank plus noise models are ubiquitous in statistics and signal processing [Joh06], and can be recast in the language of the present paper. As an example, consider the case of a signal vector ${\bm{\theta}}\in{\mathbb{R}}^{n}$ , with i.i.d. components, and assume we observe the rank-one-plus-noise matrix ${Y}={\bm{\theta}}{\bm{\theta}}^{\top}+\sigma_{n}{\bm{W}}$ . Here ${\bm{W}}$ is a noise matrix, with –for instance– $W_{uv}\sim\mathcal{N}(0,1)$ and $\sigma_{n}$ controls the noise level.

We take $G_{n}$ to be the complete graph, and $(\theta_{u})_{u\in V_{n}}$ be i.i.d. random variables333Unlike for the model described in the introduction, the variables $\theta_{u}$ ’s typically take any value in ${\mathbb{R}}$ , and their distribution is non-uniform. However, it is easy to reduce from one case to the other. For instance, we can let $Y_{uv}=\mathcal{N}(h(\theta_{u})h(\theta_{v}),\sigma^{2}_{n})$ . We can choose the nonlinear function $h:{\mathbb{R}}\to{\mathbb{R}}$ so that $h(\theta_{v})\sim P_{0}$ when $\theta_{v}\sim{\sf Unif}([0,1])$ . from a distribution $P_{\theta}$ on $\mathbb{R}$ . Observations on the edges are given by

[TABLE]

where $\mathcal{N}(\mu,\sigma^{2})$ denotes the Gaussian distribution.

This example can be easily generalized. For instance, higher rank models can be produced by taking $\theta_{u}\in{\mathbb{R}}^{r}$ , $r\geq 1$ fixed. Rectangular (non-symmetric) random matrices of dimensions $n_{1}\times n_{2}$ , can also be produced by setting $n=n_{1}+n_{2}$ . In this case $\theta_{v}=(\zeta_{v},b_{v})$ where $\zeta_{u}\in{\mathbb{R}}^{r}$ and $b_{v}\in\{1,2\}$ depending whether $v$ belongs to the first $n_{1}$ vertices (left factor) or the last $n_{2}$ ones (right factor).

Community detection. The stochastic block model is a popular model for community detection in networks. The model is parametrized by a symmetric ‘connectivity’ matrix $(c_{rs})_{1\leq r,s\leq q}$ , whereby $c_{r,s}\in[0,1]$ is the expected edge density between vertices in communities $r$ and $s$ . (For the sake of simplicity, we consider here the ‘balanced’ case in which the $q$ communities have all equal expected size.) Each vertex $v\in V_{n}$ is assigned a label $\theta_{v}\in[q]$ independently and uniformly at random. Conditional on ${\bm{\theta}}$ , we generate a graph $\tilde{G}_{n}=(V_{n},\tilde{E}_{n})$ by connecting vertices $u,v$ independently with probability ${\mathbb{P}}((u,v)\in\tilde{E}_{n}|{\bm{\theta}})=c_{\theta_{u},\theta_{v}}$ .

We can encode this model in our general framework as follows. The graph $G_{n}$ is the complete graph, and observe $Y_{uv}\in\{0,1\}$ on every edge, where $Q(Y_{uv}=1|\theta_{u}=r,\theta_{v}=s)=c_{r,s}$ . The connection with the standard description is given by the correspondence $\{Y_{uv}=1\}\;\;\Leftrightarrow\;\;\{(u,v)\in\tilde{E}_{uv}\}$ . The same encoding can be used for the planted clique problem.

Let us note that although the above models are special cases of our framework, we will focus in the rest of the paper onto graphs whose local–weak limit (to be defined shortly) is locally finite. This rules out graphs with diverging typical degree (in particular the complete graph).

3.2 Local–weak convergence and amenability

For the reader’s convenience, we collect here some relevant graph-theoretic definitions, referring to [BS01, AL07, LP17] for more details. In this paper, all graphs have a finite or countably infinite vertex set, are connected, and are locally finite; i.e., all vertices have finite degree. A rooted graph $(G,o)$ is a graph $G$ together with a choice of a vertex $o\in V(G)$ , called the root of $G$ . We say that two rooted graphs $(G,o)$ and $(G^{\prime},o^{\prime})$ are isomorphic—and we write $(G,o)\equiv(G^{\prime},o^{\prime})$ —if there exists an edge–preserving and root–preserving bijective map $\phi:V(G)\mapsto V(G^{\prime})$ , i.e., $(u,v)\in E(G)\Leftrightarrow(\phi(u),\phi(v))\in E(G^{\prime})$ , and $\phi(o)=o^{\prime}$ . For an integer $l\geq 0$ , define $[G,o]_{l}$ to be the rooted subgraph spanned by a ball of radius $l$ around the root $o$ on $G$ : this is the rooted graph $((V_{l},E_{l}),o)$ where $V_{l}=B_{G}(o,l):=\{u\in V(G):d_{G}(o,u)\leq l\}$ , and $E_{l}=\{(u,v)\in E:u,v\in V_{l}\}$ . Here, $d_{G}$ is the graph distance in $G$ .

Definition 3.1.

A sequence of rooted graphs $(G_{n},o_{n})_{n\geq 1}$ is said to converge locally to a rooted graph $(G,o)$ , and we write $(G_{n},o_{n})\xrightarrow[]{loc.}(G,o)$ , if for every radius $l\geq 0$ , there exists $n_{0}\geq 0$ such that $[G_{n},o_{n}]_{l}\equiv[G,o]_{l}$ for all $n\geq n_{0}$ .

This notion of convergence endows the set $\mathcal{G}_{*}$ (of $\equiv$ –equivalence classes) of rooted graphs with a metrizable topology, called the topology of local, or Benjamini–Schramm, convergence [BS01]. This gives $\mathcal{G}_{*}$ the structure of a complete separable metric space. Now we can define $\mathscrsfs{P}(\mathcal{G}_{*})$ , the space of probability measures on $\mathcal{G}_{*}$ when endowed with its Borel $\sigma$ –algebra. Then we endow $\mathscrsfs{P}(\mathcal{G}_{*})$ with the usual topology of weak convergence.

From a finite deterministic graph $G$ , we can construct a random rooted graph $(G,o)$ by choosing the root $o$ uniformly at random from $V(G)$ . We denote the law of this random rooted graph by $\rho_{G}\in\mathscrsfs{P}(\mathcal{G}_{*})$ .

Definition 3.2.

A sequence of finite graphs $(G_{n})_{n\geq 1}$ is said to converge locally–weakly to a random rooted graph $(G,o)$ if the sequence of probability measures $(\rho_{G_{n}})_{n\geq 1}$ converges weakly to a probability measure $\rho\in\mathscrsfs{P}(\mathcal{G}_{*})$ , which is the law of $(G,o)$ .

In other words, the definition requires that given a fixed finite connected rooted graph $(H,o^{\prime})$ and a fixed radius $l$ , the probability $\operatorname{\mathbb{P}}\big{(}[G_{n},o_{n}]_{l}\equiv(H,o^{\prime})\big{)}$ converges to $\operatorname{\mathbb{P}}\big{(}[G,o]_{l}\equiv(H,o^{\prime})\big{)}$ as $n\to\infty$ .

Probability measures $\rho\in\mathscrsfs{P}(\mathcal{G}_{*})$ that are local–weak limits of sequences of finite graphs as per Definition 3.2 (such measures are called sofic in the literature) inherit a important stationarity property which roughly expresses the intuition that the random graph $G$ should “look the same” when viewed from any of its vertices. A formal definition takes the form of a mass–transport principle termed unimodularity [AL07]: Similarly to $\mathcal{G}_{*}$ , we define $\mathcal{G}_{**}$ the space of $\equiv$ –equivalence classes of doubly–rooted graphs $(G,o,o^{\prime})$ where the isomorphy relation $\equiv$ and local convergence as per Definition 3.1 are both extended in the natural way.

Definition 3.3.

A measure $\rho\in\mathscrsfs{P}(\mathcal{G}_{*})$ is unimodular if for every Borel function $f:\mathcal{G}_{**}\to\mathbb{R}_{+}$ ,

[TABLE]

when $(G,o)\sim\rho$ .

It is clear that if $G$ is finite then $\rho_{G}$ is unimodular, since the root is chosen uniformly at random. Furthermore, the property of unimodularity is closed in the topology of local–weak convergence [AL07], hence all local–weak limits of sequences of finite graphs are unimodular.

Next, we define the key concept of anchored–amenability.

Definition 3.4.

An infinite rooted graph $(G,o)$ where $G=(V,E)$ is said to be anchored–amenable if its Cheeger constant anchored at $o$ is zero:

[TABLE]

Here, $\partial S=\{u\in S:\exists v\notin S,(u,v)\in E\}$ is the vertex-boundary of the set $S\subseteq V$ .

We will informally use the phrase ‘asymptotically amenable’ to refer to graph sequences that converge locally–weakly to almost surely anchored–amenable graphs.

Observe that if $G$ is vertex–transitive, the above statement does not depend on the root $o$ , and anchored–amenability reduces to the more classical notion of amenability of (non-rooted) graphs. For instance, the Euclidean lattice $\mathbb{Z}^{d}$ is amenable, the $k$ -regular tree is not (both graphs being transitive).

Observe that if $(G,o)\sim\rho$ is almost surely anchored–amenable, there exists a sequence of finite sets $S_{k}\subset V$ such that $o\in S_{k}$ and which ‘witnesses’ the amenability of $G$ : $|\partial S_{k}|/|S_{k}|\longrightarrow 0$ as $k\to\infty$ . Moreover, this random sequence can be chosen in a measurable way as a function of the rooted graph $(G,o)$ . Indeed, we can for instance label the vertices of $G$ by ${\mathbb{N}}$ , the root being labelled by [math], and for every $k\geq 1$ , choose the first finite set $S_{k}\subseteq V(G)$ (among countably many) in the lexicographic ordering such that $|\partial S_{k}|/|S_{k}|\leq 2^{-k}$ and $o\in S_{k}$ . For clarity we make this dependence explicit: $S_{k}=S_{k}(G,o)$ . We require a technical condition regarding such sets $S_{k}$ .

Definition 3.5.

We say that $\rho\in\mathscrsfs{P}(\mathcal{G}_{*})$ is tame if it is supported on anchored–amenable rooted graphs, and there exists a sequence $\{S_{k}\}_{k\geq 1}$ of sets that witnesses anchored–amenability (i.e., such that $S_{k}(G,o)$ is a measurable function of $(G,o)$ , and $|\partial S_{k}|/|S_{k}|\to 0$ almost surely) such that the following holds. For every $\eta>0$ there exists $\delta>0$ such that

[TABLE]

By extension, we say that the random rooted graph $(G,o)$ is tame if its law $\rho$ is tame.

Intuitively, tameness is satisfied when the size of the neighborhoods $S_{k}(G,u)$ of each vertex $u$ around the root is comparable with $S_{k}(G,o)$ . To discuss it further, it is useful to introduce the random variables

[TABLE]

The tameness condition requires a uniform upper bound on the lower tail of $(\alpha_{k}(G,o))_{k\geq 1}$ . An equivalent way to express this condition is to say that the sequence of random variables $(1/\alpha_{k}(G,o))_{k\geq 1}$ is tight when $(G,o)\sim\rho$ .

Note that $\operatorname{\mathbb{E}}_{\rho}[\alpha_{k}(G,o)]=1$ whenever $\rho$ is unimodular. Indeed, by a direct application of the mass-transport principle (for the function $f(G,o,u)=\frac{{\mathbf{1}}_{o\in S_{k}(G,u)}}{|S_{k}(G,u)|}$ )

[TABLE]

Moreover, tameness is satisfied if $\rho$ is supported on vertex-transitive graphs, and in this case $\alpha_{k}(G,o)=1$ almost surely. Indeed, assume $\rho$ is supported on a single vertex-transitive graph. Then $\rho$ is unimodular whence $\operatorname{\mathbb{E}}_{\rho}[\alpha_{k}(G,o)]=1$ , but $\alpha_{k}(G,o)$ is non-random and therefore $\alpha_{k}(G,o)=1$ . In the general case where $\rho$ is not an atom, since $\alpha_{k}(G,o)=1$ almost surely conditional on $(G,o)$ , we have $\alpha_{k}(G,o)=1$ almost surely unconditionally as well.

We next provide a few examples of graphs that are anchored-amenable and tame.

Example 1 (Percolation clusters). Consider the $d$ -dimensional grid $\mathbb{L}^{d}$ , i.e., $V(\mathbb{L}^{d})={\mathbb{Z}}^{d}$ , and edges connect vertices at distance one $E(\mathbb{L}^{d})=\{({\bm{x}},{\bm{y}})\in{\mathbb{Z}}^{d}:\;\|{\bm{x}}-{\bm{y}}\|=1\}$ . Remove edges independently with probability $1-p$ and let $G=G_{p}(o)$ be the connected component of the origin $o={\bm{0}}$ . We consider $p>p_{c}$ , the percolation threshold on $\mathbb{L}^{d}$ so that $G$ is infinite with positive probability, and condition on the event that $G$ is indeed infinite. In this case we can take $S_{k}(G,{\bm{x}})$ to be the subset of vertices contained in the $\ell_{\infty}$ ball of radius $\ell_{k}$ around ${\bm{x}}$ : $S_{k}(G,{\bm{x}})=\{{\bm{y}}\in V(G):\|{\bm{y}}-{\bm{x}}\|_{\infty}\leq\ell_{k}\}$ , for a deterministic sequence of radii $\ell_{k}\uparrow\infty$ . A classical result of Newman and Schulman [NS81] implies $|S_{k}(G,o)|/\ell_{k}^{d}\to c_{0}$ almost surely for some non-random constant $c_{0}>0$ . Further $\partial S_{k}\subseteq\{{\bm{x}}\in{\mathbb{Z}}^{d}:\ell_{k}-1\leq\|{\bm{x}}\|_{\infty}\leq\ell_{k}\}$ whence $|\partial S_{k}|\leq c_{1}\ell_{k}^{d-1}$ . Hence, there exists a random $k_{0}<\infty$ such that almost surely $|\partial S_{k}|/|S_{k}|\leq(2c_{1}/c_{0})\,\ell_{k}^{-1}\to 0$ , for all $k\geq k_{0}$ .

Further, $|S_{k}(G,o)|\leq c_{2}\ell_{k}^{d}$ , and $o\in S_{k}(G,{\bm{x}})$ if and only if ${\bm{x}}\in S_{k}(G,o)$ . Therefore

[TABLE]

Therefore $\underset{k\to\infty}{\liminf}\alpha_{k}(G,o)\geq c_{0}/c_{2}>0$ a.s., whence $\rho\big{(}\alpha_{k}(G,o)\leq\delta\big{)}\to 0$ for all $\delta<c_{0}/c_{2}$ .

Example 2 (Random geometric graph). In this case the vertices are the points of a Poisson point process on ${\mathbb{R}}^{d}$ with constant intensity $\gamma$ . Any two vertices ${\bm{x}},{\bm{y}}$ are connected by an edge if and only if $\|{\bm{x}}-{\bm{y}}\|_{2}\leq r$ for a fixed radius $r>0$ . We choose the root $o\in V(G)$ as the closest vertex to the origin ${\bm{0}}$ and let $G$ be the connected component of $o\in V$ . This graph is infinite with positive probability provided $\gamma$ is larger than the percolation threshold $\gamma_{c}$ for this model [P*+*03].

The calculations for Bernoulli bond percolation on ${\mathbb{Z}}^{d}$ can be applied almost verbatim to the random geometric graph. In particular, letting $S_{k}(G,o)=\{{\bm{x}}\in V(G):\,\|{\bm{x}}\|_{\infty}\leq\ell_{k}\}$ witnesses anchored–amenability and satisfies the tameness assumption.

4 Results for asymptotically amenable graphs

Recall that $Y^{(\varepsilon)}_{G}$ refers to the union of the vertex- and edge-observations over $G$ : $Y^{(\varepsilon)}_{G}\>=\{Y_{uv}:(u,v)\in E,\xi^{(\varepsilon)}_{u}:u\in V\}$ . A natural way to construct an estimator $\hat{\bm{\theta}}$ is to first estimate the posterior marginals of ${\bm{\theta}}$ given $Y^{(\varepsilon)}_{G_{n}}$ at every vertex:

[TABLE]

Letting $(\widehat{\mu}_{u})_{u\in G_{n}}$ be such estimates of the posterior marginals, we can construct $\hat{\bm{\theta}}$ , for instance, by independently sampling from the marginals: $\hat{\theta}_{u}\sim_{\text{ind}}\widehat{\mu}_{u}$ , for all $u\in V_{n}$ .

Of course, computing the exact posterior probabilities $\mu_{G_{n},u}(x)$ is in general intractable. As a tractable alternative, we can compute a local version of the vertex marginals by using only observations in a ball of radius $l$ around each vertex. For $u\in V_{n}$ and $x\in{\mathcal{X}}$ , let

[TABLE]

(Recall that $B_{G_{n}}(u,l)=\{v\in V:d_{G_{n}}(u,v)\leq l\}$ denotes the set of vertices within graph distance $l$ form $u$ in $G_{n}$ .) The local marginals $\widehat{\mu}_{G_{n},u,l}(x)$ can be computed with complexity at most $|{\mathcal{X}}|^{|B_{G_{n}}(u,l)|}$ per vertex. The complexity of estimating all the vertex marginals is linear or nearly linear, under additional assumptions. In particular:

•

If $G_{n}$ has degree bounded by $k_{\max}$ independently of $n$ , then $|B_{G_{n}}(u,l)|\leq k_{\max}^{l+1}$ .

•

If $G_{n}$ converges to a locally finite unimodular graphs, then

[TABLE]

In other words, for each $\varepsilon$ , there exists $M(\varepsilon)$ such that, for all $n$ large enough, all but a fraction $\varepsilon$ of the vertices $u$ have neighborhood of size bounded by $M(\varepsilon)$ . Hence $\mu_{G_{n},u}(x)$ can be estimated for all but a fraction $\varepsilon$ of the vertices in linear time.

Notice that we can safely neglect $o(n)$ atypical vertices for our purposes. For instance, the matrix estimation risk (1.4) is bounded away in the present setting (unless the channel $Q$ is noiseless), and therefore ignoring $o(n)$ vertices has a negligible impact on the asymptotic risk.

Do the local estimates $\widehat{\mu}_{G_{n},u,l}$ provide good approximations of the actual marginals $\mu_{G_{n},u}$ ? Our first result shows that this is the case for asymptotically amenable graphs, for almost all $\varepsilon>0$ , and on average over vertices in $G_{n}$ .

Theorem B.

Let $G_{n}=(V_{n},E_{n})$ be a sequence of finite graphs (with $|V_{n}|=n$ ) that converges locally–weakly to random rooted graph $(G,o)\sim\rho$ which is almost surely anchored–amenable and tame. Then for almost every $\varepsilon>0$ ,

[TABLE]

The proof of this theorem follows from a technical result which we will present next.

We define an observation model $({\bm{\theta}},{Y},\xi^{(\varepsilon)})$ on the infinite random graph $G$ exactly as for the finite graphs $G_{n}$ . We then let

[TABLE]

where we condition on the realization of the rooted graph and on $\sigma$ -algebra generated by the sequence of random variables $\big{(}Y^{(\varepsilon)}_{B_{G}(o,l)}\big{)}_{l\geq 0}$ . Equivalently, we can also define $\mu_{G,o}(x)$ as the almost-sure limit of the sequence $\big{(}\operatorname{\mathbb{P}}(\theta_{o}=x|(G,o),Y^{(\varepsilon)}_{B_{G}(o,l)})\big{)}_{l\geq 0}$ , where convergence is guaranteed by Lévy’s upward theorem. We have the following general relation between marginals on the finite graphs $G_{n}$ , and marginals on the infinite rooted graph $(G,o)$ .

Proposition 4.1.

Under the conditions of Theorem B, we have for all $x\in{\mathcal{X}}$ and almost every $\varepsilon>0$ ,

[TABLE]

(The expectation on the right-hand side is w.r.t. the randomness of $Y^{(\varepsilon)}_{G}$ and $(G,o)$ .)

The proof of Proposition 4.1 is presented in Section 6. Theorem B is a consequence of Proposition 4.1 as shown below.

Proof of Theorem B.

We claim that $\operatorname{\mathbb{E}}\big{[}\widehat{\mu}_{G_{n},u,l}(x)\mu_{G_{n},u}(x)\big{]}=\operatorname{\mathbb{E}}\big{[}\widehat{\mu}_{G_{n},u,l}(x)^{2}\big{]}$ . Indeed, by conditioning on $Y^{(\varepsilon)}_{B_{G_{n}}(u,l)}$ we obtain

[TABLE]

Now we use the fact that for two measures $\mu$ and $\nu$ on ${\mathcal{X}}$ , $d_{\mbox{\rm\tiny TV}}(\mu,\nu)=\frac{1}{2}\|\mu-\nu\|_{\ell_{1}}\leq\frac{1}{2}\sqrt{|{\mathcal{X}}|}\cdot\|\mu-\nu\|_{\ell_{2}}$ :

[TABLE]

(Here and below $\|\mu-\nu\|_{\ell_{p}}$ denotes the $\ell_{p}$ norm of the vector $(\mu(x)-\nu(x))_{x\in{\mathcal{X}}}$ .) The claim follows by averaging over $u\in V_{n}$ , and applying Proposition 4.1. $\blacksquare$

Note that Theorem B is not sufficient to establish Theorem A about the optimality of polynomial-time algorithms to estimate the pairwise correlations $(X_{f})_{u,v}=f(\theta_{u})f(\theta_{v})$ . Indeed, the latter requires to approximate the joint distribution of $\theta_{u}$ , $\theta_{v}$ for $u,v\in V_{n}$ two arbitrary vertices. In order to achieve this goal, we define a decoupled estimator:

[TABLE]

Note that $\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})}$ may a priori have suboptimal accuracy. This is however not the case for almost all $\varepsilon$ .

Proposition 4.2.

Let $\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})}\in\mathbb{R}^{n\times n}$ be defined as per Eq. (4.5). Then for almost every $\varepsilon>0$ ,

[TABLE]

The proof of the above proposition can be found in Appendix A.

Given Theorem B and Proposition 4.2, it is natural to consider the following low complexity version of $\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})}$ :

[TABLE]

Since we can compute $\widehat{\mu}_{G_{n},u,l}(x)$ for all but $o(n)$ vertices in time $\mathcal{O}(1)$ , the overall complexity of $\widehat{{\bm{X}}}^{(l)}$ is $\mathcal{O}(n^{2})$ . (Setting $\widehat{X}^{(l)}_{uv}=0$ for a sublinear fraction of vertices produces a negligible error.) We can now prove Theorem A.

Proof of Theorem A. Since Proposition 4.2 yields $\mathcal{R}_{n}(\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})};f)-\mathcal{R}_{n}^{\textup{Bayes}}(f)\rightarrow 0$ for almost all $\varepsilon>0$ , we only need to compare the risks of $\widehat{{\bm{X}}}^{(l)}$ and $\widehat{{\bm{X}}}^{(\mbox{\rm\tiny dec})}$ . We have

[TABLE]

We have

[TABLE]

By consecutive triangle inequalities, this is bounded in absolute value by

[TABLE]

Here, $\|f\|_{\infty}$ denotes the supremum norm of $f$ .

On the other hand, and following a similar strategy,

[TABLE]

Invoking Theorem B concludes the proof.

Theorem B and Proposition 4.1 allow to control other metrics for the estimation errors beyond $\mathcal{R}_{n}(\widehat{{\bm{X}}};f)$ . As an example, we consider the ‘overlap’ metric that applies to estimators $\hat{\bm{\theta}}:{\mathcal{Y}}^{E_{n}}\times{\mathcal{X}}_{\star}^{V_{n}}\to{\mathcal{X}}^{V_{n}}$ which assign labels to vertices. We define

[TABLE]

where $\mathscr{S}_{q}$ is the set of permutations on ${\mathcal{X}}$ , with $q=|{\mathcal{X}}|$ .

As a corollary of Proposition 4.1, the overlap between a sample from the local marginals and ${\bm{\theta}}$ can be lower-bounded in a nontrivial way (the proof can be found in Appendix A):

Corollary 4.3.

For each let $l\geq 1$ , let $\hat{{\bm{\theta}}}^{(l)}=(\hat{\theta}^{(l)}_{u})_{u\in V_{n}}$ where $\hat{\theta}^{(l)}_{u}\sim\widehat{\mu}_{G_{n},u,l}$ independently for all $u\in V_{n}$ . Then for almost every $\varepsilon>0$ ,

[TABLE]

As the radius $l$ of the local balls increases, the performance of $\hat{\bm{\theta}}^{(l)}$ approaches that of a sample drawn from the full marginals $(\mu_{G_{n},u})_{u\in V_{n}}$ .

5 Results for random regular graphs

The assumption of anchored–amenability is crucial in the proofs of Theorems A and B. While we do not know whether a weaker condition is sufficient, we show that these results do not hold for at least one non-amenable case, namely, when $G_{n}$ is a random $k$ -regular graph with constant degree $k$ . For the case of ${\mathbb{Z}}_{q}$ –synchronization we show that in a certain regime of signal-to-noise ratio (SNR), the local estimates of vertex marginals provide no information about the hidden assignment ${\bm{\theta}}$ , while in the same regime, it is information-theoretically possible to estimate ${\bm{\theta}}$ non-trivially.

As mentioned in the introduction, an information-computation gap has been observed in several statistical models. However, none of the rigorous results in the literature matches the setting of Theorems A and B. To the best of our knowledge, the closest example is the case of the stochastic block model with $q$ communities on sparse random graphs (see [Abb17] for a comprehensive survey and references therein). As explained in Section 3.1, this example fits our framework, although with $G_{n}$ being the complete graph. In particular, $G_{n}$ does not converge to a locally finite graphs. In contrast, the example treated in this section satisfies all the assumptions of Theorems A and B except amenability (and tameness). Proofs for this section are deferred to Appendices B and C.

5.1 Information-theoretic reconstruction: An exhaustive search algorithm

Given a graph $G=(V,E)$ on $n$ vertices, ${\bm{\theta}}\in{\mathcal{X}}^{V}$ and ${Y}\in{\mathcal{Y}}^{E}$ , we define the edge empirical distribution

[TABLE]

This is a probability distribution on ${\mathcal{X}}\times{\mathcal{X}}\times{\mathcal{Y}}$ : $\hat{\nu}^{G}_{{\bm{\theta}},{Y}}\in\mathscrsfs{P}({\mathcal{X}}\times{\mathcal{X}}\times{\mathcal{Y}})$ . (Recall that $\mathscrsfs{P}(S)$ denotes the simplex of probability distributions over the set $S$ .) Define $\overline{\nu}\in\mathscrsfs{P}({\mathcal{X}})$ to be the uniform distribution on ${\mathcal{X}}$ and $\overline{\nu}_{\mbox{\rm\tiny e}}\in\mathscrsfs{P}({\mathcal{X}}\times{\mathcal{X}}\times{\mathcal{Y}})$ via

[TABLE]

We then define the set of ‘typical’ assignments of node variables by

[TABLE]

We then consider the reconstruction algorithm that outputs a typical configuration

[TABLE]

If $\Theta(\eta_{n};G,{Y})$ is empty, we define $\hat{\bm{\theta}}(G,{Y})$ arbitrarily (for instance $\hat{\bm{\theta}}(G,{Y})={\bm{\theta}}_{*}$ for a fixed reference configuration ${\bm{\theta}}_{*}\in{\mathcal{X}}^{V}$ ). If $\Theta(\eta_{n};G,{Y})$ contains more than one element, then $\hat{\bm{\theta}}(G,{Y})$ selects one arbitrarily, e.g., the first one in lexicographic order. In fact our proofs apply to any algorithm that satisfy condition (5.2) with high probability. As discussed below (see Remark Remark) this condition is also satisfied by the randomized estimator $\hat{\bm{\theta}}\sim\operatorname{\mathbb{P}}\big{(}\cdot|Y^{(\varepsilon)}_{G_{n}}\big{)}$ that samples from the posterior.

It is immediate to show that the typical set is non-empty with high probability. (Throughout this section, we use ${\bm{\theta}}_{0}$ for the ground truth, in order to distinguish it from a generic vector ${\bm{\theta}}\in{\mathcal{X}}^{n}$ .)

Lemma 5.1.

Let $G_{n}$ be a random $k$ -regular graph on $n$ vertices, and let $({\bm{\theta}}_{0},{Y})$ be distributed according to the random observation model described in the Introduction. Then, there exists $c_{0}=c_{0}(|{\mathcal{X}}|,|{\mathcal{Y}}|)>0$ such that

[TABLE]

Remark.

As mentioned above, one might consider a randomized estimator $\hat{\bm{\theta}}$ that outputs a sample from the posterior: $\hat{{\bm{\theta}}}\sim\operatorname{\mathbb{P}}\big{(}\,\cdot\,|Y^{(\varepsilon)}_{G_{n}}\big{)}$ . Note that this satisfies the condition $\hat{{\bm{\theta}}}\in\Theta(\eta_{n},G_{n},{Y})$ (cf. Eq. (5.2)) with the same probability $1-c_{0}^{-1}\exp\{-c_{0}(\log n)^{2}\}$ . Indeed this follows simply by noting that, with this definition, the pair $(\hat{\bm{\theta}},{Y})$ is distributed as $({\bm{\theta}}_{0},{Y})$ . Therefore all the results to follow apply to this randomized estimator as well.

Given two assignments ${\bm{\theta}}_{0},{\bm{\theta}}\in{\mathcal{X}}^{V}$ , we define their joint empirical vertex distribution as

[TABLE]

This is a probability distribution on ${\mathcal{X}}\times{\mathcal{X}}$ : $\hat{\omega}_{{\bm{\theta}}_{0},{\bm{\theta}}}\in\mathscrsfs{P}({\mathcal{X}}\times{\mathcal{X}})$ .

To state the next result let us briefly recall some notions form information theory. Given a discrete random variable (or random vector) $X$ , we denote by $H(X)$ the Shannon entropy of the law of $X$ , namely –with a slight abuse of notation– $H(X)=H(P_{X})=-\sum_{x}P_{X}(x)\log P_{X}(x)$ . For a vector $(X_{1},\dots,X_{m})$ , $H(X_{1},\dots,X_{m})=H(P_{X_{1},\dots,X_{m}})$ . The conditional entropy is defined by $H(X|Y)=H(X,Y)-H(Y)$ , and the mutual information by $I(X;Y)=H(X)-H(X|Y)=H(Y)-H(Y|X)$ .

Theorem C.

Assume there exists $c_{M}>0$ such that $c_{M}^{-1}\leq Q(y|x_{1},x_{2})\leq c_{M}$ for all $x_{1},x_{2}\in{\mathcal{X}}$ , $y\in{\mathcal{Y}}$ , and let $(\theta_{1},\theta_{2},Y)$ have joint distribution $\overline{\nu}_{e}$ (recall that $\overline{\nu}_{e}(x_{1},x_{2},y)=\overline{\nu}(x_{1})\overline{\nu}(x_{2})\times Q(y|x_{1},x_{2})$ where $\overline{\nu}$ is the uniform distribution over ${\mathcal{X}}$ ). If

[TABLE]

for some $\varepsilon>0$ , then there exists $\delta=\delta(\varepsilon,c_{M})>0$ and a constant $c_{0}>0$ such that

[TABLE]

The proof of this theorem relies on a truncated first moment method, where we count the expected number of typical assignments ${\bm{\theta}}\in\Theta(\eta_{n};G_{n},{Y})$ having a given value of the empirical overlap distribution $\hat{\omega}_{{\bm{\theta}},{\bm{\theta}}_{0}}$ , conditioned on certain typicality constraints on the instance $(G_{n},{\bm{\theta}}_{0},{Y})$ . The full argument is deferred to Appendix B.2. (We refer, e.g., to [DMS13] for similar calculations in a somewhat simpler context.)

The next corollary applies the result of Theorem C to $\mathbb{Z}_{q}$ –synchronization.

Corollary 5.2.

Consider the $\mathbb{Z}_{q}$ –synchronization problem. If

[TABLE]

then there exists $\delta,c_{0}>0$ depending on $k,p,q$ such that, with probability at least $1-c_{0}^{-1}\exp\{-c_{0}(\log n)^{2}\}$ , $d_{\mbox{\rm\tiny TV}}(\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}},\overline{\nu}\times\overline{\nu})\geq\delta$ .

Furthermore, as $p\to 1$ , we have

[TABLE]

This corollary follows from Theorem C simply by computing $I(\theta_{1},\theta_{2};Y)$ in the case of $\mathbb{Z}_{q}$ –synchronization. We omit the details. Finally, we deduce from Theorem C the possibility of weak recovery.

Corollary 5.3.

Under the assumptions of Theorem C, if $\frac{k}{2}I(\theta_{1},\theta_{2};Y)\geq H(\theta_{1})+\varepsilon$ , then there exists a constant $\delta=\delta(\varepsilon)>0$ such that

[TABLE]

Moreover, there exists a function $f:{\mathcal{X}}\mapsto\mathbb{R}$ with zero mean, unit variance, and a constant $\delta=\delta(\varepsilon,|{\mathcal{X}}|,c_{M})>0$ such that

[TABLE]

In particular, the conclusions (5.4) and (5.5) hold in the $\mathbb{Z}_{q}$ –synchronization model if $k>k_{*}(p;q)$ .

5.2 Performance of the local algorithm

In this section we examine the asymptotics of the local marginals

[TABLE]

when $G_{n}$ is a random $k$ -regular graph, in the special case of $\mathbb{Z}_{q}$ –synchronization with side information from BEC( $\bar{\varepsilon}$ ).

We have seen in the previous section that weak recovery is possible (albeit non-efficiently) when $(1-p)^{2}k>\frac{4\log q}{q-1}+\mathcal{O}(1-p)$ even in the absence of side information (Corollary 5.3). We show on the other hand that the local marginals are approximately uniform if $(1-p)^{2}(k-1)<1$ . The latter condition is known as the Kesten-Stigum threshold for the problem of robust reconstruction on the tree [JM04].

Theorem D.

Consider $\mathbb{Z}_{q}$ –synchronization with side information from BEC( $\bar{\varepsilon}$ ) on a random $k$ -regular graph $G_{n}$ . There exist constants $c=c(k,p,q)$ and $C=C(k,p,q)$ such that the following holds. If $(1-p)^{2}(k-1)<1$ and $\varepsilon\leq c$ then

[TABLE]

The above theorem implies that all estimators $(\hat{{\bm{\theta}}}^{(l)})_{l\geq 1}$ where $\hat{\theta}^{(l)}_{u}\sim\widehat{\mu}_{u,l}$ independently for all $u\in V_{n}$ , have almost trivial performance. Recall the definition of the matrix $\widehat{{\bm{X}}}^{(l)}$ :

[TABLE]

Corollary 5.4.

In the setting of Theorem D, if $(1-p)^{2}(k-1)<1$ , then there exists constants $c_{1},c_{2}>0$ depending on $k$ and $q$ such that

[TABLE]

Moreover, for all $f:\mathbb{Z}_{q}\mapsto\mathbb{R}$ with zero mean and unit variance,

[TABLE]

Remark.

The above implies that no local algorithm can estimate ${\bm{X}}_{f}$ with non-trivial accuracy. Indeed, the estimator of $f(\theta_{u})f(\theta_{v})$ of minimal risk based on the information contained is the balls of radius $l$ centered around $u$ and $v$ respectively is $\operatorname{\mathbb{E}}\big{[}f(\theta_{u})f(\theta_{v})|Y^{(\varepsilon)}_{B_{G_{n}}(u,l)\cup B_{G_{n}}(v,l)}\big{]}$ . The latter quantity is equal to $\widehat{X}^{(l)}_{uv}$ if the two balls are disjoint, which is the case for $1-o_{n}(1)$ fraction of pairs of vertices $(u,v)$ when $l$ is held constant.

The proof of Theorem D is deferred to Appendix C.1, but we give here an outline. We use local–weak convergence to first lift the problem to the infinite $k$ -regular tree, in which the study of the local marginals reduces to the study of a certain distributional recursion. Then we prove that below the Kesten-Stigum threshold, the uniform distribution $\overline{\nu}$ is a stable fixed point of this recursion. The argument proceeds as follows. Let $o$ be the root of infinite $(k-1)$ –ary tree $T$ and denote by $T(l)$ the subtree consisting of the first $l$ generations of $T$ rooted at $o$ . Now let $\mu_{o,l}(x):=\operatorname{\mathbb{P}}\big{(}\theta_{o}=x|{Y}^{(\varepsilon)}_{T_{k}(l)}\big{)}$ for all $x\in\mathbb{Z}_{q}$ and consider the sequence $z_{l}:=\operatorname{\mathbb{E}}[\mu_{o,l}(\theta_{o})|\xi_{o}=\star]-\frac{1}{q}$ which measures the deviation from uniformity of the local marginal at the root. We use the recursive structure of the tree to show that for $\varepsilon$ small enough and $\kappa=(1-p)^{2}(k-1)$ , the sequence $(z_{l})_{l\geq 0}$ satisfies the approximate recursion

[TABLE]

where $C(q)$ is constant depending only on $q$ . Since $z_{0}=0$ , this implies that if $\kappa<1$ then the sequence stays within an interval of size $C^{\prime}(q,\kappa)\varepsilon$ around the origin. This, in turn, can be converted to the claim of Theorem D. The analysis of this recursion originates in the study of the robust reconstruction problem on the tree. In this problem, a spin at the root (an ${\mathcal{X}}$ -valued r.v.) is broadcast through noisy channels along edges of the tree. The statistician observes a noisy realization of this process on the leaves of $T(l)$ for large $l$ , and is tasked with inferring the value at the root (see e.g., [EKPS00, MP03, JM04]). Similar recursions also arise in the study of the ‘robustness’ of phase transitions in the Ising model on the tree [PS99]. In particular, our analysis builds on ideas from [MM06, Sly11].

6 Proof of Proposition 4.1

We start with the proof of (4.3), which is straightforward and does not need the amenability assumption. The proof of (4.4) will crucially hinge upon a property of decay of certain point–to–set correlations (Lemma 6.2), which we establish using anchored–amenability and the presence of $\varepsilon$ –side information. For ease of notation, we adopt the following convention in this section: in quantities of the form $\operatorname{\mathbb{P}}\big{(}\theta_{o}=x|Y^{(\varepsilon)}_{A}\big{)}$ where $A$ is any subgraph of $G$ , it is implicit that the rooted graph $(G,o)$ is also conditioned on, abbreviating the more accurate but lengthier notation $\operatorname{\mathbb{P}}\big{(}\theta_{o}=x|(G,o),Y^{(\varepsilon)}_{A}\big{)}$ .

6.1 Proof of the ‘local’ statement (4.3)

Let $x\in{\mathcal{X}}$ and $l\geq 1$ . The function $f:\mathcal{G}_{*}\to[0,1]$ defined by $f(G,o)=\operatorname{\mathbb{E}}\big{[}\operatorname{\mathbb{P}}\big{(}\theta_{o}=x|Y^{(\varepsilon)}_{[G,o]_{l}}\big{)}^{2}\big{]}$ is clearly continuous in the topology of local convergence. Indeed for $(G_{n},o_{n})\xrightarrow[]{loc.}(G,o)$ , let $n_{0}\geq 1$ such that $[G_{n},o_{n}]_{l}\equiv[G,o]_{l}$ for all $n\geq n_{0}$ . Hence $f(G_{n},o_{n})=f(G,o)$ for all $n\geq n_{0}$ . Since $f$ is also bounded, we obtain by local–weak convergence under uniform rooting that

[TABLE]

Next, we observe that the sequence $\big{(}\operatorname{\mathbb{P}}\big{(}\theta_{o}=x|(G,o),Y^{(\varepsilon)}_{[G,o]_{l}}\big{)}\big{)}_{l\geq 1}$ is a bounded martingale, therefore it converges almost surely and in $\mathbb{L}_{2}$ to $\operatorname{\mathbb{P}}\big{(}\theta_{o}=x|Y^{(\varepsilon)}_{G}\big{)}$ by Lévy’s upward theorem. This concludes the proof of the first statement (4.3):

[TABLE]

6.2 Proof of the ‘global’ statement (4.4)

The proof breaks into three parts. First, we easily obtain a lower bound from Jensen’s inequality:

[TABLE]

Therefore

[TABLE]

where the last equality is the content of statement (4.3). As for the upper bound, we have

Lemma 6.1.

Consider the $\sigma$ -algebra

[TABLE]

where $d_{G}$ is the distance in $G$ . Then

[TABLE]

Proof.

Fix $u\in V_{n}$ and $x\in{\mathcal{X}}$ . We condition on the r.v.’s $\theta_{\partial B_{G_{n}}(u,l)}:=\{\theta_{v}:v\in V_{n},d_{G_{n}}(u,v)=l\}$ and apply Jensen’s inequality:

[TABLE]

We now observe that conditionally on the boundary variables $\theta_{\partial B_{G_{n}}(u,l)}$ , $\theta_{u}$ is independent of $Y^{(\varepsilon)}_{vw}$ for all $v$ and $w$ outside the ball $B_{G_{n}}(u,l)$ . This is guaranteed by the spatial Markov property of the model. Therefore

[TABLE]

The event on the right–hand side is localized to a ball of fixed radius. So by local–weak convergence, we pass to the limiting rooted graph $(G,o)$ , (similarly to the proof of (4.3)):

[TABLE]

Now using the same Markov property as above, the expectation in the right–hand side remains unchanged if we further condition on ${\mathcal{F}}_{o}^{\geq l}:=\{\theta_{v}:v\in B_{G}(o,l)^{c}\}$ and $\{Y^{(\varepsilon)}_{u,v}:u,v\in B_{G}(o,l)^{c}\}$ , which are beyond the boundary of $B_{G}(o,l)$ : the extra information is irrelevant to $\theta_{o}$ . We arrive at the upper bound

[TABLE]

Now we observe that the sequence $\big{(}\operatorname{\mathbb{P}}\big{(}\theta_{o}=x|(G,o),Y^{(\varepsilon)}_{G},{\mathcal{F}}_{o}^{\geq l}\big{)}\big{)}_{l\geq 1}$ is a bounded backward martingale (since the corresponding filtration is decreasing), which converges to $\operatorname{\mathbb{P}}\big{(}\theta_{o}=x|(G,o),\mathcal{T}^{(\varepsilon)}_{\infty}\big{)}$ a.s. and in $\mathbb{L}_{2}$ by Lévy’s downward theorem. This concludes the argument. $\blacksquare$

The last piece of the proof is to show that the lower and upper bounds (6.1) and (6.2) coincide when $(G,o)$ is a.s. anchored–amenable:

Proposition 6.2.

Assume $(G,o)\sim\rho$ is unimodular, almost surely anchored–amenable and tame. Then for almost every $\varepsilon>0$ and all $x\in{\mathcal{X}}$ ,

[TABLE]

This is the only part of the proof which requires assumptions of the limiting random rooted graph, and the presence of non-zero side information from BEC( $\bar{\varepsilon}$ ). We reiterate that unimodularity is guaranteed if $(G,o)$ is the limit of a sequence of finite graphs (see Section 3), so it is automatically satisfied in our setting.

The first ingredient in the proof of Proposition 6.2 is the following generic lemma that allow to control the dependency under the posterior between variables $\theta_{u}$ associated to vertices in the interior of a set $S\subset V(G)$ and variables $\theta_{\partial S}$ associated to the boundary of this set. Our first lemma bounds the mutual information between $\theta_{u}$ and $\theta_{\partial S}$ . This result is inspired by Lemma 3.1 in [Mon08]. Let us recall the definition of conditional mutual information between $X$ and $Y$ given $Z$ : $I(X;Y|Z)=H(X|Z)-H(X|Y,Z)=H(Y|Z)-H(Y|X,Z)$ , where $H(X|Y)=H(X,Y)-H(Y)$ is the conditional entropy.

Lemma 6.3.

Let $G$ be a graph, and $S\subset V(G)$ finite and non-empty. For all $\varepsilon\geq 0$ , we have

[TABLE]

Proof.

The argument relies on differentiating the conditional Shannon entropy of $\theta_{\partial S}$ given $Y^{(\varepsilon)}_{S}$ with respect to $\varepsilon$ . Let us first replace the single parameter $\varepsilon$ (the probability of non-erasure) by a set of parameters $\underline{\varepsilon}=(\varepsilon_{u})_{u\in S}$ : for each vertex $u$ , $\theta_{u}$ is revealed with probability $\varepsilon_{u}$ . We also replace the notation $Y^{(\underline{\varepsilon})}_{S}$ by $(Y,\xi)$ , omitting an explicit reference to $\underline{\varepsilon}$ and to the ball $S$ . We finally denote $\xi^{\backslash(u)}=\{\xi_{v}:v\in S,v\neq u\}$ with $\xi_{u}$ removed. We have

[TABLE]

Taking a derivative w.r.t. $\varepsilon_{u}$ yields:

[TABLE]

where the latter is the conditional mutual information of $\theta_{u}$ and $\theta_{\partial S}$ given $(Y,\xi^{\backslash(u)})$ . Now we set $\varepsilon_{u}=\varepsilon$ for all $u\in S$ . We obtain

[TABLE]

We now integrate w.r.t. $\varepsilon$ :

[TABLE]

The second line is by positivity of entropy, the third line follows from the fact that conditioning reduces the entropy, the fourth line is by sub-additivity, and the last line is since $\theta_{u}$ is marginally uniform on ${\mathcal{X}}$ . Now we finish the proof by observing that $I\big{(}\theta_{u};\theta_{\partial S}|Y,\xi\big{)}=I\big{(}\theta_{u};\theta_{\partial S}|Y,\xi^{\backslash(u)},\xi_{u}\big{)}\leq I\big{(}\theta_{u};\theta_{\partial S}|Y,\xi^{\backslash(u)}\big{)}$ because the left–hand side vanishes whenever $\xi_{u}\neq\star$ . $\blacksquare$

Next, we translate Lemma 6.3 into an average statement about decay to point–to–set correlations:

Lemma 6.4.

Let $G$ be a graph and $S\subset V(G)$ finite and non-empty. Then for all $\varepsilon>0$ ,

[TABLE]

Proof.

Let $L\geq\text{diam}(S)$ so that $S\subseteq B_{G}(u,L)$ for every $u\in S$ . By Jensen’s inequality we have

[TABLE]

and

[TABLE]

Therefore,

[TABLE]

The last quantity is equal to

[TABLE]

Summing over $x$ and using (6.3) we get

[TABLE]

We send $L$ to infinity and use martingale convergence on the left-hand side, and use Pinsker’s inequality on the right-hand side to obtain

[TABLE]

Now we obtain the desired result by averaging over $u\in S$ and $\varepsilon$ and using Jensen’s inequality, and then invoking Lemma 6.3:

[TABLE]

$\blacksquare$

Now we are in a position to prove Proposition 6.2.

Proof of Proposition 6.2.

Assume $(G,o)$ is almost surely anchored–amenable and tame. Let $(S_{k}=S_{k}(G,o))_{k\geq 1}$ be the sequence of finite measurable random subsets of $V(G)$ satisfying the conditions of Definition 3.5 (recall in particular that $o\in S_{k}$ .) We use Lemma 6.4 with this choice of sequence $(S_{k})_{k\geq 1}$ , and then average over the realization of the rooted graph $(G,o)\sim\rho$ :

[TABLE]

where $\Delta_{k}\to 0$ by an application of dominated convergence (since $|\partial S_{k}(G,o)|/|S_{k}(G,o)|\to 0$ almost surely by assumption). Now we let $f_{k}:\mathcal{G}_{**}\mapsto\mathbb{R}_{+}$ defined by

[TABLE]

With this notation, expression (6.4) is equal to $\operatorname{\mathbb{E}}_{\rho}\big{[}\sum_{u\in V(G)}f_{k}(G,o,u)\big{]}$ . By unimodularity of $\rho$ , this is also equal to

[TABLE]

where $\alpha_{k}(G,o)=\sum_{u\in V(G)}{\mathbf{1}}_{o\in S_{k}(G,u)}|S_{k}(G,u)|^{-1}$ . Now we use the tameness assumption: the sequence $(1/\alpha_{k}(G,o))_{k}$ is tight. Let $Z(G,o)$ be the above integral over $\varepsilon^{\prime}$ (so that the above display is $\operatorname{\mathbb{E}}_{\rho}[\alpha_{k}(G,o)\cdot Z(G,o)]$ .) For $\eta>0$ let $\delta>0$ such that

[TABLE]

Since all involved quantities are nonnegative, we have

[TABLE]

Since $Z(G,o)\leq\varepsilon|{\mathcal{X}}|$ a.s., we obtain

[TABLE]

where we have used (6.4) and (6.2) to obtain the last display. Letting $k\to\infty$ and then $\eta\to 0$ we obtain for all $\varepsilon\geq 0$ ,

[TABLE]

and this concludes the proof. $\blacksquare$

Acknowledgements

This work was partially supported by grants NSF DMS-1613091, CCF-1714305, IIS-1741162, and ONR N00014-18-1-2729.

Appendix A Amenable graphs: some omitted proofs

A.1 Proof of Proposition 4.2

The proof is based on a decoupling principle under $\varepsilon$ -perturbation of a general observation channel. This principle is given in Lemma 3.1 in [Mon08], which once specialized to our setting, takes the following form:

Lemma A.1 (Lemma 3.1 [Mon08]).

For all $\varepsilon>0$ , it holds that

[TABLE]

This is very similar to our Lemma 6.3. In fact the latter follows the same line of proof.

Recall the definition of the decoupled estimator

[TABLE]

For a pair of vertices $u,v\in V_{n}$ we let $\mu_{u,v,G_{n}}(x,x^{\prime}):=\operatorname{\mathbb{P}}\big{(}\theta_{u}=x,\theta_{v}=x^{\prime}\big{|}Y^{(\varepsilon)}_{G_{n}}\big{)}$ , for $x,x^{\prime}\in{\mathcal{X}}$ . Expanding the squares and cancelling equal terms we have

[TABLE]

Moreover,

[TABLE]

and

[TABLE]

Therefore,

[TABLE]

We used Pinsker’s inequality and Jensen’s inequality in the last line. We apply Lemma A.1 and Jensen’s inequality and obtain for all $\varepsilon>0$ ,

[TABLE]

Since the integrand is non-negative, it too converges to zero almost everywhere.

A.2 Proof of Corollary 4.3

The proof follows from statement (4.3) of Proposition 4.1 since

[TABLE]

Here in $(a)$ we used the fact that, by construction $\hat{\theta}_{u}\sim\widehat{\mu}_{G_{n},u,l}(\,\cdot\,)$ and in $(b)$ the remark, already made in the proof of Theorem B, that $\operatorname{\mathbb{E}}\big{[}\widehat{\mu}_{G_{n},u,l}(x)\mu_{G_{n},u}(x)\big{]}=\operatorname{\mathbb{E}}\big{[}\widehat{\mu}_{G_{n},u,l}(x)^{2}\big{]}$ .

Appendix B Information-theoretic reconstruction on random graphs: Technical proofs

B.1 Proof of Lemma 5.1

This is a consequence of McDiarmid’s bounded differences inequality. For $(u,v)\in E$ and $(x_{1},x_{2},y_{12})\in{\mathcal{X}}\times{\mathcal{X}}\times{\mathcal{Y}}$ , we let $X_{uv}(x_{1},x_{2},y_{12})=\mathds{1}\{\theta_{0,u}=x_{1},\theta_{0,v}=x_{2},Y_{uv}=y_{12}\}$ , and let $Z(x_{1},x_{2},y_{12})=\frac{1}{|E|}\sum_{(u,v)\in E}X_{uv}(x_{1},x_{2},y_{12})-\operatorname{\mathbb{E}}[X_{uv}(x_{1},x_{2},y_{12})])$ . Since

[TABLE]

we have

[TABLE]

We associate to each edge $(i,j)\in E$ an independent random variable $U_{ij}\sim{\sf Unif}([0,1])$ . We can then construct a function $f:{\mathcal{X}}\times{\mathcal{X}}\times[0,1]\to{\mathcal{Y}}$ , such that $Q(y_{12}|\theta_{1},\theta_{2})={\mathbb{P}}(f(\theta_{1},\theta_{2},U_{12})=y_{12}|\theta_{1},\theta_{2})$ . Hence we can define ${Y},{\bm{\theta}}_{0}$ by letting $Y_{uv}=f(\theta_{0,u},\theta_{0,v},U_{uv})$ for each $(u,v)\in E$ , and we view $Z(x_{1},x_{2},y_{12})$ as a function of the independent random variables $\{\theta_{0,u},U_{uv}\}$ .

Moreover, if we change the value $\theta_{0,u}$ at vertex $u$ to $\theta^{\prime}_{0,u}$ and call $Z^{\prime}(x_{1},x_{2},y_{12})$ the resulting value of $Z(x_{1},x_{2},y_{12})$ , we have $|Z-Z^{\prime}|\leq\frac{k}{|E|}=\frac{2}{n}$ (recall that $k$ is the degree of $u$ and $|E|=\frac{nk}{2}$ ). If we further change $U_{uv}$ to $U^{\prime}_{uv}$ at an edge $(uv)\in E$ , we have $|Z-Z^{\prime}|\leq 1/|E|$ . The bounded differences inequality then implies

[TABLE]

Now we let $\eta^{\prime}=2\frac{\eta}{|{\mathcal{X}}|^{2}|{\mathcal{Y}}|}$ and $\eta=\frac{(\log n)}{\sqrt{n}}$ .

B.2 Proof of Theorem C: A truncated first moment method

Instead of working directly with the ensemble of random regular graphs, we will use the configuration model [Bol80] for our moment computations. Let $kn$ be even and let $\mathcal{M}_{nk}$ be the set of perfect matchings on $nk$ vertices. For $\mathfrak{m}\in\mathcal{M}_{nk}$ we define the multi-graph $G(\mathfrak{m})$ on $n$ vertices where a vertex $i^{\prime}\in[nk]$ in $\mathfrak{m}$ is sent to a vertex $i$ in $G(\mathfrak{m})$ through the mapping $i^{\prime}\mapsto i=i^{\prime}~{}(mod)~{}n$ . The resulting multi-graph may contain multiple edges and self-loops. The configuration model is the probability measure $\operatorname{\mathbb{P}}^{\mbox{\tiny\rm cm}}_{n,k}$ on multi-graphs induced by the uniform measure on perfect matchings through the above mapping. The measure $\operatorname{\mathbb{P}}^{\mbox{\tiny\rm cm}}_{n,k}$ conditioned on the multi-graph $G(\mathfrak{m})$ being simple (i.e., not having self-loops nor multiple edges) is the uniform measure on $k$ -regular graphs $\operatorname{\mathbb{P}}^{\mbox{\tiny\rm reg}}_{n,k}$ . The probability that $G(\mathfrak{m})$ is simple under $\operatorname{\mathbb{P}}^{\mbox{\tiny\rm cm}}_{n,k}$ is $(1-\mathcal{O}(k^{3}/n))e^{(1-k^{2})/4}$ for large $n$ by a formula of McKay and Wormald [Wor99]. Therefore, for any event $A$ and sequence $\varepsilon_{n}\to 0$ , $\operatorname{\mathbb{P}}^{\mbox{\tiny\rm cm}}_{n,k}(A)\geq 1-\varepsilon_{n}$ implies $\operatorname{\mathbb{P}}^{\mbox{\tiny\rm reg}}_{n,k}(A)\geq 1-c(k)\varepsilon_{n}$ with $c(k)>0$ depending only on $k$ .

Let $G_{n}=(V_{n},E_{n})$ be from the configuration model with $V_{n}=[n]$ . We will assume edges to be directed, and the direction to be chosen uniformly at random. The number of such graphs is

[TABLE]

Indeed, $N_{n,k}$ is the number of ordered pairings of the $nk$ half-edges. Such a pairing can be constructed by ordering the $nk$ half-edges (which can be done in $(nk)!$ possible ways), and then pairing consecutive half edges following this ordering. Each pairing can arise in $(nk/2)!$ possible ways.

We next state a standard counting lemma that will be useful in what follows. Given finite alphabets $\overline{\mathcal{X}},\overline{\mathcal{Y}}$ , and integers $n,k$ with $nk=2m$ even, let $\mathscrsfs{P}_{k}(\overline{\mathcal{X}}\times\overline{\mathcal{X}}\times\overline{\mathcal{Y}})\subseteq\mathscrsfs{P}(\overline{\mathcal{X}}\times\overline{\mathcal{X}}\times\overline{\mathcal{Y}})$ be the subset of probability distributions $\nu\in\mathscrsfs{P}(\overline{\mathcal{X}}\times\overline{\mathcal{X}}\times\overline{\mathcal{Y}})$ such that $\nu(x_{1},x_{2},y)\in{\mathbb{N}}/m$ for all $x_{1},x_{2}\in\overline{\mathcal{X}}$ , $y\in\overline{\mathcal{Y}}$ , and $\sum_{\tilde{x},y}(\nu(x,\tilde{x},y)+\nu(\tilde{x},x,y))\in{\mathbb{N}}/n$ for all $x\in\overline{\mathcal{X}}$ .

Given $\nu\in\mathscrsfs{P}(\overline{\mathcal{X}}\times\overline{\mathcal{X}}\times\overline{\mathcal{Y}})$ , we let $\pi_{1}\nu(x)\equiv\sum_{\tilde{x}\in\overline{\mathcal{X}},y\in{\mathcal{Y}}}\nu(x,\tilde{x},y)$ , $\pi_{2}\nu(x)\equiv\sum_{\tilde{x}\in\overline{\mathcal{X}},y\in{\mathcal{Y}}}\nu(\tilde{x},x,y)$ . We further let $\pi_{12}\nu(x_{1},x_{2})=\sum_{y\in\overline{\mathcal{Y}}}\nu(x_{1},x_{2},y)$ .

Recall that Shannon entropy of a probability distribution $p$ on the finite set ${\mathcal{X}}$ is $H(p)=-\sum_{x\in S}p(x)\log p(x)$ , and the joint empirical edge distribution of $({\bm{\theta}},{Y})$ on a graph $G$ is

[TABLE]

Lemma B.1.

For such $\nu$ , let $N_{n,k}(\nu)$ be the number of triples $(G,{\bm{\theta}},{Y})$ where $G=(V=[n],E)$ is a graph from the configuration model, ${\bm{\theta}}\in\overline{\mathcal{X}}^{V}$ , ${Y}\in\overline{\mathcal{Y}}^{E}$ , with edge empirical distribution equal to $\nu$ . Let $\nu_{v}\equiv(\pi_{1}\nu+\pi_{2}\nu)/2$ . Then

[TABLE]

Proof.

Recall that $m=nk/2$ is the number of edges in $G$ . Note that $m\pi_{1}\nu(x)$ is the number of edges $(u,v)$ such that $\theta_{u}=x$ , and $m\pi_{2}\nu(x)$ is the number of edges $(u,v)$ such that $\theta_{v}=x$ . Therefore $m(\pi_{1}\nu(x)+\pi_{2}\nu(x))/k=n(\pi_{1}\nu(x)+\pi_{2}\nu(x))/2$ is the number of vertices $u$ such that $\theta_{u}=x$ . Further $m\pi_{12}\nu(x_{1},x_{2})$ is the number of edges $(u,v)$ such that $\theta_{u}=x_{1}$ and $\theta_{v}=x_{2}$ .

Given a non-negative integer vector $(b(x))_{x\in S}$ with $b_{\mbox{\tiny\rm sum}}\equiv\sum_{x\in S}b(x)$ , we denote the corresponding multinomial coefficient by

[TABLE]

We then obtain the following exact counting formula (where $\nu_{v}(x)\equiv(\pi_{1}\nu(x)+\pi_{2}\nu(x))/2$ and $\nu_{12}=\pi_{12}\nu$ ):

[TABLE]

The first factor account for the number of ways of choosing ${\bm{\theta}}$ . The second corresponds to the ways of giving a matching type to half-edfes. The third factor counts the number of ways of matching half-edges, and the last one the number of ways of assigning labels in $\overline{\mathcal{Y}}$ to edges.

Equation (B.1) follows by using the following elementary bounds (that hold for any $N\in{\mathbb{N}}$ and any $p\in\mathscrsfs{P}(S)$ ):

[TABLE]

$\blacksquare$

Now recall the joint empirical distribution of two assignments ${\bm{\theta}}_{0},{\bm{\theta}}\in{\mathcal{X}}^{V}$ :

[TABLE]

Further, let $\overline{\nu}_{\mbox{\rm\tiny e}}(x_{1},x_{2},y)=\overline{\nu}(x_{1})\,\overline{\nu}(x_{2})\,Q(y|x_{1},x_{2})$ , $\overline{\nu}$ being the uniform distribution on ${\mathcal{X}}$ , and

[TABLE]

Given a graph $G$ , a true assignment ${\bm{\theta}}_{0}$ , observations ${Y}$ , and a closed set ${\mathcal{S}}\subseteq\mathscrsfs{P}({\mathcal{X}}\times{\mathcal{X}})$ we define

[TABLE]

where $\eta_{n}=(\log n)/\sqrt{n}$ . We denote by ${\mathcal{G}}_{n}$ the set of instances, i.e., triples $(G_{n},{\bm{\theta}}_{0},{Y})$ , where $G_{n}$ is a graph over $n$ vertices, ${\bm{\theta}}_{0}\in{\mathcal{X}}^{V_{n}}$ and ${Y}\in{\mathcal{Y}}^{E_{n}}$ .

Lemma B.2.

Assume there exists $c_{M}>0$ such that $c_{M}^{-1}\leq Q(y|x_{1},x_{2})\leq c_{M}$ for all $x_{1},x_{2}\in{\mathcal{X}}$ , $y\in{\mathcal{Y}}$ . Define the map $S:\mathscrsfs{P}({\mathcal{X}}^{2}\times{\mathcal{X}}^{2}\times{\mathcal{Y}})\mapsto{\mathbb{R}}$ by

[TABLE]

(Here $\pi_{1}$ , $\pi_{2}$ are defined as in Lemma B.1, with $\overline{\mathcal{X}}={\mathcal{X}}^{2}$ , and $H$ denotes the Shannon entropy.) Further define $S_{*}:\mathscrsfs{P}({\mathcal{X}}\times{\mathcal{X}})\mapsto{\mathbb{R}}$ by

[TABLE]

There is a set ${\mathcal{G}}_{n}^{*}\subseteq{\mathcal{G}}_{n}$ of ‘good’ instances such that the following happens. For ${\mathcal{S}}\subseteq\mathscrsfs{P}({\mathcal{X}}\times{\mathcal{X}})$ a closed set, we have

[TABLE]

Proof.

Given a tuple $(G,{\bm{\theta}}_{0},{\bm{\theta}},{Y})$ , where $G=(V,E)$ is a graph, ${\bm{\theta}},{\bm{\theta}}_{0}\in{\mathcal{X}}^{V}$ , ${Y}\in{\mathcal{Y}}^{E}$ , we define its joint edge empirical distribution $\widehat{\Omega}^{G}_{{\bm{\theta}}_{0},{\bm{\theta}},{Y}}\in\mathscrsfs{P}({\mathcal{X}}\times{\mathcal{X}}\times{\mathcal{X}}\times{\mathcal{X}}\times{\mathcal{Y}})$ as

[TABLE]

In other words $\widehat{\Omega}^{G}_{{\bm{\theta}}_{0},{\bm{\theta}},{Y}}(x_{1},\tilde{x}_{1},x_{2},\tilde{x}_{2},y_{12})$ is the probability that, sampling an edge $(u,v)\in E$ uniformly at random, we have $\theta_{0,u}=x_{1}$ , $\theta_{u}=\tilde{x}_{1}$ , $\theta_{0,v}=x_{2}$ , $\theta_{v}=\tilde{x}_{2}$ , $Y_{uv}=y_{12}$ . Let $\mathscrsfs{P}_{nk}({\mathcal{X}}^{4}\times{\mathcal{Y}})\subseteq\mathscrsfs{P}({\mathcal{X}}^{4}\times{\mathcal{Y}})$ be the subset of probability distributions with entries that are integer multiples of $1/|E|=2/(nk)$ . For $\Omega\in\mathscrsfs{P}_{nk}({\mathcal{X}}^{4}\times{\mathcal{Y}})$ , we let $N_{n,k}(\Omega)$ denote the number of tuples with edge empirical distribution equal to $\Omega$ :

[TABLE]

Notice that setting $\overline{\mathcal{X}}={\mathcal{X}}\times{\mathcal{X}}$ , we can view $({\bm{\theta}}_{0},{\bm{\theta}})$ as a vector in $\overline{\mathcal{X}}^{V}$ and $\Omega$ as a probability distribution in $\mathscrsfs{P}(\overline{\mathcal{X}}\times\overline{\mathcal{X}}\times{\mathcal{Y}})$ . Applying Eq. (B.1) and Lemma B.1, we get

[TABLE]

We define

[TABLE]

Then Eq. (B.6) follows immediately from Lemma 5.1.

We also define $B_{\mbox{\rm\tiny TV}}(\overline{\nu}_{\mbox{\rm\tiny e}};\eta_{n})\equiv\{\nu\in\mathscrsfs{P}({\mathcal{X}}\times{\mathcal{X}}\times{\mathcal{Y}}):\;\;d_{\mbox{\rm\tiny TV}}(\nu,\overline{\nu}_{\mbox{\rm\tiny e}})\leq\eta_{n}\}$ . With this notation

[TABLE]

and therefore, using Eq. (B.9),

[TABLE]

Recall the definition of $\widehat{\Omega}^{G}_{{\bm{\theta}}_{0},{\bm{\theta}},{Y}}$ from Eq. (B.7). We observe that this empirical measure has the following marginals:

[TABLE]

Moreover, if $Q$ does not vanish, we have

[TABLE]

Therefore the summand in the formula (B.10) depends only in the empirical edge distribution $\widehat{\Omega}^{G}_{{\bm{\theta}}_{0},{\bm{\theta}},{Y}}$ of the instance $(G,{\bm{\theta}}_{0},{\bm{\theta}},{Y})$ . Now let $\mathscrsfs{Q}(\eta_{n})\subseteq\mathscrsfs{P}({\mathcal{X}}^{4}\times{\mathcal{Y}})$ be the set of $\Omega\in\mathscrsfs{P}({\mathcal{X}}^{4}\times{\mathcal{Y}})$ satisfying the constraints

[TABLE]

We have

[TABLE]

We applied Lemma B.1 in the last line above. Due to the second constraint in (B.11), we can upper bound $F(\Omega)$ as follows

[TABLE]

Therefore, letting

[TABLE]

we arrive at

[TABLE]

which implies the claim. $\blacksquare$

The next result provides a sufficient condition for weak recovery using the estimator $\hat{\bm{\theta}}$ satisfying Eq. (5.2); this is a more general version of Theorem C.

Theorem E.

Assume there exists $c_{M}>0$ such that $c_{M}^{-1}\leq Q(y|x_{1},x_{2})\leq c_{M}$ for all $x_{1},x_{2}\in{\mathcal{X}}$ , $y\in{\mathcal{Y}}$ . Assume $S_{*}(\overline{\nu}\times\overline{\nu})<-\varepsilon<0$ . Then there exists $\delta=\delta(\varepsilon,c_{M})>0$ such that, with probability at least $1-c_{0}^{-1}\exp\{-c_{0}(\log n)^{2}\}$ , the following happens

[TABLE]

Proof.

Recall that $B_{\mbox{\rm\tiny TV}}(\overline{\nu}\times\overline{\nu};\delta)$ denotes the set of probability distributions $\omega\in\mathscrsfs{P}({\mathcal{X}}\times{\mathcal{X}})$ such that $d_{\mbox{\rm\tiny TV}}(\omega,\overline{\nu}\times\overline{\nu})\leq\delta$ . We claim that, under the stated assumptions there exists $\delta,c_{1}>0$ such that, setting ${\mathcal{S}}_{\delta}=B_{\mbox{\rm\tiny TV}}(\overline{\nu}\times\overline{\nu};\delta)$ , and ${\mathcal{G}}_{*}$ as in Lemma B.2, we have

[TABLE]

Hence, applying Lemma B.2, it follows that, with probability at least $1-c_{0}^{-1}\exp\{-c_{0}(\log n)^{2}\}$ (eventually adjusting the constant $c_{0}$ ), $Z({\mathcal{S}}_{\delta};G_{n},{\bm{\theta}}_{0},{Y})=0$ . Hence $\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}}\not\in{\mathcal{S}}_{\delta}$ by construction of $\hat{\bm{\theta}}$ , and therefore the claim follows.

We are left with the task of proving Eq. (B.13), which by Lemma B.2 and a continuity argument, follows from $S(\overline{\nu}\times\overline{\nu})<-\varepsilon$ . $\blacksquare$

The condition $S_{*}(\overline{\nu}\times\overline{\nu})<-\varepsilon$ might be hard to verify in practice because it requires solving the optimization problem (B.5). We provide a simpler sufficient condition, which is the content of Theorem C:

Lemma B.3.

Let $(\theta_{1},\theta_{2},Y)\sim\overline{\nu}_{e}$ , with $\overline{\nu}_{e}(x_{1},x_{2},y)=\overline{\nu}(x_{1})\overline{\nu}(x_{2})Q(y|x_{1},x_{2})$ . We have $S_{*}(\overline{\nu}\times\overline{\nu})\leq-\frac{k}{2}I(\theta_{1},\theta_{2};Y)+H(\theta_{1})$ .

Proof.

Let $\Omega_{*}\in\mathscrsfs{P}({\mathcal{X}}^{2}\times{\mathcal{X}}^{2}\times{\mathcal{Y}})$ be any distribution achieving the maximum in (B.5) for $\omega=\overline{\nu}\times\overline{\nu}$ , and let $(X_{1},\tilde{X}_{1},X_{2},\tilde{X}_{2},Y)$ have distribution $\Omega_{*}$ . Note that $(X_{1},X_{2},Y)\sim\overline{\nu}_{e}$ , $(\tilde{X}_{1},\tilde{X}_{2},Y)\sim\overline{\nu}_{e}$ , $(X_{1},\tilde{X}_{1})\sim\omega$ , $(X_{2},\tilde{X}_{2})\sim\omega$ , $\omega=\overline{\nu}\times\overline{\nu}$ , whence

[TABLE]

Step $(a)$ follows by sub-additivity of entropy. $\blacksquare$

Hence, if $\frac{k}{2}I(\theta_{1},\theta_{2};Y)\geq H(X\theta_{1})+\varepsilon$ , then $S_{*}(\overline{\nu}\times\overline{\nu})<-\varepsilon<0$ , and the claim follows by applying Theorem E.

B.3 Proof of Corollary 5.3

Let $\mathscr{B}_{q\times q}$ is the set of all $q\times q$ non-negative doubly stochastic matrices (with $q=|{\mathcal{X}}|$ ). It holds that

[TABLE]

Indeed, since the right-most expression in the above display is a linear program, the objective value is maximized at the extreme points of the polytope $\mathscr{B}_{q\times q}$ , which by Birkhoff’s theorem are permutation matrices: $\pi(x,y)={\mathbf{1}}_{y=\sigma(x)}$ for $\sigma\in\mathscr{S}_{q}$ , hence the equality.

Since $q\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}}\in\mathscr{B}_{q\times q}$ (we abused notation and identified the joint distribution $\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}}$ on ${\mathcal{X}}\times{\mathcal{X}}$ with a $q\times q$ matrix), we have

[TABLE]

Now, on the event $d_{\mbox{\rm\tiny TV}}(\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}},\overline{\nu}\times\overline{\nu})\geq\delta$ , we have $\sum_{x,x^{\prime}}(\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}}(x,x^{\prime}))^{2}\geq\frac{1}{q^{2}}+\frac{\delta^{2}}{q}$ . Hence $\textup{{overlap}}(\hat{\bm{\theta}},{\bm{\theta}}_{0})\geq\frac{1}{q}+\delta^{2}$ on the same event.

Next, we prove the second statement. For two functions $f,g:{\mathcal{X}}\mapsto\mathbb{R}$ , we let $\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}}(f,g):=\sum_{x_{1},x_{2}}\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}}(x_{1},x_{2})f(x_{1})g(x_{2})$ . Theorem C implies

[TABLE]

Indeed, if $d_{\mbox{\rm\tiny TV}}(\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}},\overline{\nu}\times\overline{\nu})\geq\delta$ then there exist $x_{1},x_{2}\in{\mathcal{X}}$ such that $|\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}}(x_{1},x_{2})-\frac{1}{q^{2}}|\geq\frac{\delta}{q^{2}}$ . Now take $f=(\delta_{x_{1}}-\frac{1}{q})\frac{q}{q-1}$ and $g=(\delta_{x_{2}}-\frac{1}{q})\frac{q}{q-1}$ .

On the other hand, letting $\mathcal{F}:=\{f=(\delta_{x}-\frac{1}{q})\frac{q}{q-1},x\in{\mathcal{X}}\}$ , a union bound implies

[TABLE]

Therefore, there exists a (deterministic) pair $f,g\in\mathcal{F}$ such that $\operatorname{\mathbb{P}}\big{(}|\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}}(f,g)|\geq\frac{\delta}{q(q-1)}\big{)}\geq\frac{1-o_{n}(1)}{q^{2}}>c_{0}>0$ . By Markov’s inequality, this in turn implies that for this specific pair $f,g\in\mathcal{F}$ we have

[TABLE]

Now consider estimating the matrix ${\bm{X}}_{f}$ (recall that $(X_{f})_{uv}=f(\theta_{u})f(\theta_{v})$ ) with the matrix $\widehat{{\bm{X}}}^{(\lambda)}$ having entries $\widehat{X}^{(\lambda)}_{uv}=\lambda g(\hat{\theta}_{u})g(\hat{\theta}_{v})$ , with $\lambda=n^{2}\operatorname{\mathbb{E}}\big{[}\hat{\omega}_{\hat{\bm{\theta}},{\bm{\theta}}_{0}}(f,g)^{2}\big{]}\big{/}\operatorname{\mathbb{E}}\big{[}\|\widehat{{\bm{X}}}^{(1)}\|_{F}^{2}\big{]}$ . Since

[TABLE]

the loss $\mathcal{R}_{n}$ incurred is

[TABLE]

We have $\operatorname{\mathbb{E}}\|{\bm{X}}_{f}\|_{F}^{2}=\sum_{u,v\in V_{n}}\operatorname{\mathbb{E}}[f(\theta_{u})^{2}f(\theta_{v})^{2}]=\frac{n}{q}\sum_{x\in\mathbb{Z}_{q}}f(x)^{4}+n(n-1)$ . So $\lim\frac{1}{n^{2}}\operatorname{\mathbb{E}}\|{\bm{X}}_{f}\|_{F}^{2}=1$ . Furthermore, since $\|g\|_{\infty}=1$ , $\operatorname{\mathbb{E}}\|\widehat{{\bm{X}}}^{(1)}\|_{F}^{2}=\sum_{u,v\in V_{n}}\operatorname{\mathbb{E}}[g(\hat{\theta}_{u})^{2}g(\hat{\theta}_{v})^{2}]\leq n^{2}$ . Combining these estimates with the lower bound (B.15) implies $\limsup\mathcal{R}_{n}\big{(}\widehat{{\bm{X}}}^{(\lambda)};f\big{)}<1-c(q)\delta^{2}$ . Since $\mathcal{R}_{n}^{\textup{Bayes}}(f)\leq\mathcal{R}_{n}\big{(}\widehat{{\bm{X}}}^{(\lambda)};f\big{)}$ this concludes the proof.

Appendix C Local algorithms on random graphs: Technical proofs

C.1 Proof of Theorem D

C.1.1 Preliminaries

Let $(T_{k},o)$ denote the infinite $k$ -regular tree rooted at $o$ . (Except the root $o$ , every vertex has $k-1$ offsprings.) By expanding the square, we get

[TABLE]

(Here, $d_{\ell_{2}}$ is the $\ell_{2}$ distance in $\mathbb{R}^{q}$ .) Since the graph sequence $(G_{n})_{n\geq 1}$ almost surely converges locally–weakly to (a Dirac delta on) $(T_{k},o)$ , we have

[TABLE]

Recall

[TABLE]

Let $Q_{x}=\text{Law}(\mu_{o,l}|\theta_{o}=x,\xi^{(\varepsilon)}_{o}=\star)$ be the conditional law of $\mu_{o,l}$ given the value at the root being $x$ and no information revealed by the side channel. This is a probability distribution on the simplex $\Delta^{q-1}=\mathscrsfs{P}(\mathbb{Z}_{q})$ : $Q_{x}\in\mathscrsfs{P}(\Delta^{q-1})$ . Furthermore, let $Q=\text{Law}(\mu_{o,l}|\xi^{(\varepsilon)}_{o}=\star)=\frac{1}{q}\sum_{x\in\mathbb{Z}_{q}}Q_{x}$ . The following simple lemma from [MM06] is quite useful.

Lemma C.1.

For every $x\in\mathbb{Z}_{q}$ , $Q_{x}$ has a density w.r.t. $Q$ , and $\frac{\mathrm{d}Q_{x}}{\mathrm{d}Q}(\mu)=q\mu(x)$ for all $\mu\in\Delta^{q-1}$ .

Proof.

Let $\psi:\Delta^{q-1}\to\mathbb{R}$ be bounded measurable. We let $Y\equiv\{Y^{(\varepsilon)}_{B_{T_{k}}(o,l)}\}$ . Then

[TABLE]

Therefore $\mathrm{d}Q_{x}/\mathrm{d}Q(\mu)=q\mu(x)$ . $\blacksquare$

With the above lemma in hand, the right-hand side in (C.1) can be written as

[TABLE]

The first equality follows by conditioning on $\xi^{(\varepsilon)}_{o}$ as noting that conditional on $\xi^{(\varepsilon)}_{o}\neq\star$ , $\mu_{o}(x)={\mathbf{1}}\{x=\xi_{o}^{(\varepsilon)}\}$ . Lemma C.1 was used to obtain the second equality.

In light of the above expression, we will track the evolution of the sequence

[TABLE]

which measures the deviation from uniformity of the local marginal at the root. In order to exploit the recursive structure of the tree, we will need to work at the level of the first offsprings of $o$ . For every offspring $u$ of $o$ , we denote by $T^{\downarrow}(u,l)$ the first $l$ generations of the subtree rooted at $u$ not containing $o$ ; this is a $(k-1)$ –ary tree. Now, (with a slight notation override) we redefine

[TABLE]

and consider the auxiliary sequence

[TABLE]

Note that the above definition does not depend on $u$ since $\mu_{u,l}(\theta_{u})$ have the same distribution for all $u\sim o$ . In the next proposition, we relate the two sequences $(\hat{z}_{o,l})_{l\geq 0}$ and $(z_{l})_{l\geq 0}$ , and establish a recursion for the latter.

Proposition C.2.

*Let $\kappa=(k-1)(1-p)^{2}$ and $\hat{\kappa}=k(1-p)^{2}$ . There exists constants $c,C>0$ depending only on $q$ such that the following holds. If for some $l\geq 1$ , $\hat{\kappa}|z_{l-1}|\leq c$ and $\hat{\kappa}\varepsilon\leq c$ , then *

[TABLE]

The proof of this proposition is presented in Section C.1.2 Theorem D follows directly from Proposition C.2, as shown in the next Corollary.

Corollary C.3.

If $\kappa<1$ and $\hat{\kappa}\varepsilon<c$ for a constant $c=c(q,\kappa)$ then there exists $L=L(q,\kappa)$ such that $|\hat{z}_{o,l}|\leq L\varepsilon$ for all $l\geq 0$ .

Proof.

We only need to prove that $|z_{l}|\leq L\varepsilon$ , which we will achieve by induction. Since $z_{0}=0$ , let’s assume that $|z_{l}|\leq L\varepsilon$ for a fixed $l\geq 0$ . Then we obtain from Proposition C.2 that

[TABLE]

It suffices to find an $L$ (independent of $\varepsilon$ ) such that the above upper bound is smaller than $L\varepsilon$ for all $\varepsilon$ . This is equivalent to the quadratic inequality $\kappa\frac{q-1}{q}+C\kappa^{2}\varepsilon-(1-\kappa)L+C\kappa^{2}\varepsilon L^{2}\leq 0$ . The smallest solution to this inequality is $L_{*}=\frac{1-\kappa-\sqrt{\Delta}}{2\alpha}$ , with $\alpha=C\kappa^{2}\varepsilon$ , and $\Delta=(1-\kappa)^{2}-4\alpha(\alpha+\frac{q-1}{q}\kappa)$ . Latter is non-negative provided that $\varepsilon<c_{0}(q)(\frac{1}{\kappa}-1)^{2}$ for constant some $c_{0}(q)>0$ . Moreover, for $\varepsilon$ small enough we can write $\sqrt{\Delta}=(1-\kappa)(1-2\alpha(\alpha+\frac{q-1}{q}\kappa)/(1-\alpha)^{2})+\mathcal{O}(\varepsilon^{2})$ , so that $L_{*}=(\alpha+\frac{q-1}{q}\kappa)/(4(1-\kappa))+\mathcal{O}(\varepsilon^{2})$ . Therefore, we can take $L=(C+\frac{q-1}{q}\kappa)/(4(1-\kappa))+1$ . $\blacksquare$

C.1.2 Proof of Proposition C.2: Analysis of the recursion on the tree

Here, we prove Proposition C.2. The two statements can be treated in exactly the same way; the only difference being that the root $o$ has $k$ children, while every other vertex has $k-1$ children. For this reason we only write a detailed proof for the first statement; the second one is obtained merely by replacing $k$ by $k-1$ .

Observe that conditional on $\xi_{o}^{(\varepsilon)}=\star$ the marginal at $o$ is obtained from the marginals at its offsprings $u\sim o$ by a sum-product relation which, in the case of $\mathbb{Z}_{q}$ –synchronization, has the form

[TABLE]

where $\Sigma_{o,l}$ is the normalizing constant, and $M_{x,y}(Y_{ou})=\operatorname{\mathbb{P}}(\theta_{o}=x|\theta_{u}=y,Y_{ou})=\frac{p}{q}+(1-p){\mathbf{1}}\{Y_{ou}=x-y\}$ is the Markov transition matrix associated to a ‘broadcasting process’ on the tree according to the $\mathbb{Z}_{q}$ –synchronization model.

The recursion (C.2) induces a deterministic recursion over probability distributions over the simplex $\Delta^{q-1}=\mathscrsfs{P}(\mathbb{Z}_{q})$ . Namely, if we define $Q^{(l)}_{x}:=\text{Law}(\mu_{o,l}|\theta_{o}=x)\in\mathscrsfs{P}(\Delta^{q-1})$ , we obtain a recursion that determines $Q^{(l)}_{x}$ in terms of $Q^{(l-1)}_{x}$ (notice that, by Lemma C.1, once $Q^{(l)}_{x}$ is given for one value of $x$ , it is determined for the other values as well.) The laws of $\mu_{u,l-1}$ are given by $\text{Law}(\mu_{u,l-1}|\theta_{u}=x)=Q^{(l-1)}_{x}$ for all $u\sim o$ . Note that this law does not depend on $u$ since $\mu_{u,l-1}$ are i.i.d. given $\theta_{o}$ . Then $Q^{(l)}_{x}$ can be obtained from $Q^{(l-1)}_{x}$ as follows:

Draw $\theta_{o}$ and $\theta_{u},\forall u\sim o$ independently and uniformly at random from $\mathbb{Z}_{q}$ . 2. 2.

Construct $\{Y_{ou},u\sim o\}$ according to the $\mathbb{Z}_{q}$ –synchronization model (1.1). 3. 3.

Draw $\mu_{u,l-1}$ from $Q^{(l-1)}_{\theta_{u}}$ independently for each $u\sim o$ . 4. 4.

Construct a distribution $\mu$ according to (C.2). 5. 5.

Then, given $\xi^{(\varepsilon)}_{o}=\star$ , $\mu_{o,l}$ has the same law as $\mu$ .

We now analyze the map described above. Define

[TABLE]

so $\mu_{o}(x)=Z_{o}(x)/\sum_{y}Z_{o}(y)$ , where we have dropped the indices $l$ for convenience. Following the analysis of [Sly11], we use the identity $\frac{a}{b+c}=\frac{a}{b}-\frac{ac}{b^{2}}+\frac{c^{2}}{b^{2}}\frac{a}{b+c}$ with $a=Z_{o}(x)$ , $b=q$ and $c=\sum_{y}Z_{o}(y)-q$ to write

[TABLE]

Next we compute the conditional expectations of $Z_{o}(y)$ and $Z_{o}(y)Z_{o}(y^{\prime})$ (given $\theta_{o}=x$ and $\xi_{o}=\star$ ) in order to control $\operatorname{\mathbb{E}}[\mu_{o}(\theta_{o})|\xi_{o}=\star]$ .

Lemma C.4.

Let $\delta_{u}:=\mu_{u,l}-\frac{1}{q}$ for $u\sim o$ . For all $x,y,y^{\prime}\in\mathbb{Z}_{q}$ , we have

[TABLE]

and

[TABLE]

Proof.

We start with the first identity (C.4). Since the distributions $\{(\mu_{u},Y_{ou}):\;u\sim o\}$ are conditionally independent given $\theta_{o}$ , we have

[TABLE]

Moreover,

[TABLE]

The first term in the right-hand side is $\operatorname{\mathbb{P}}(y-Y_{ou}=\theta_{u}|\theta_{o}=x)=(1-p){\mathbf{1}}_{x=y}+\frac{p}{q}$ . The second term is

[TABLE]

Therefore

[TABLE]

So we obtain

[TABLE]

where $u$ is an arbitrary offspring since terms participating in the product are all equal. Now we deal with the second identity (C.5):

[TABLE]

Similarly to a previous computation, we have

[TABLE]

and

[TABLE]

Combining and rearranging terms we obtain the desired result. $\blacksquare$

Now we use the expressions just obtained to produce Taylor estimates for each term in the decomposition (C.3).

Lemma C.5.

Let $X=\operatorname{\mathbb{E}}[\delta_{u}(\theta_{u})|\xi_{u}=\star]$ and $\hat{\kappa}=k(1-p)^{2}$ . There exists constants $c,C$ depending only on $q$ such that if $\hat{\kappa}|X|\leq c$ and $\hat{\kappa}\varepsilon<c$ , then

[TABLE]

Proof.

We use $|(1+x)^{d}-1-x|\leq e^{c}d^{2}x^{2}$ for all $x$ such that $d|x|\leq c$ . Applying this to (C.4) yields (C.6). For $k(1-p)^{2}q|X|\leq 1/2$ and $k(1-p)^{2}(q-1)\varepsilon<1/2$ we have

[TABLE]

Next, we use (C.5), combined with the fact $\sum_{y}\delta_{u}(y)=0$ to obtain that if $\hat{\kappa}(|X|\vee\varepsilon)\leq c(q)$ for some constant $c(q)$ then

[TABLE]

where $\Sigma$ gathers all the terms other than 1 in the expression (C.5), and the constant $C$ depends on $c$ . We use the inequality $(\sum_{i=1}^{n}x_{i})^{2}\leq n\sum_{i}x_{i}^{2}$ to obtain

[TABLE]

The last term was obtained by using Cauchy-Schwarz on the term $\sum_{z}\operatorname{\mathbb{E}}[\delta_{u}(y-z)\delta_{u}(y^{\prime}-z)|\xi_{u}=\star]$ in (C.5) and then replacing sums over $y,z$ by maxima. Now it remains to show that the last two terms in (C.1.2) are bounded by $X^{2}$ . Starting with the last term, we have

[TABLE]

As for the remaining term,

Lemma C.6.

We have $\sum_{y\in\mathbb{Z}_{q}}\operatorname{\mathbb{E}}[\delta_{u}(y-x+\theta_{u})|\xi_{u}=\star]^{2}\leq qX^{2}$ .

This implies $k^{2}\Sigma^{2}\leq C(q)\hat{\kappa}^{2}(\varepsilon^{2}+X^{2})$ . This, combined with (C.6), allows us to deduce (C.7). Now we treat the last term (C.8):

[TABLE]

Similarly to our treatment of the quantity $\Sigma$ , we use expression (C.5) and perform a Taylor expansion to obtain

[TABLE]

Using (C.4) the cross term can be estimated as

[TABLE]

Now we conclude

[TABLE]

$\blacksquare$

Proof of Lemma C.6. For $y\in\mathbb{Z}_{q}$ , using Lemma C.1 we have $\operatorname{\mathbb{E}}[\delta_{u}(y+\theta_{u})|\xi_{u}=\star]=\frac{1}{q}\sum_{z\in\mathbb{Z}_{q}}\operatorname{\mathbb{E}}[\delta_{u}(y+z)|\theta_{u}=z,\xi_{u}=\star]=\sum_{z\in\mathbb{Z}_{q}}\operatorname{\mathbb{E}}[\delta_{u}(y+z)\mu_{u}(z)|\xi_{u}=\star]=\sum_{z\in\mathbb{Z}_{q}}\operatorname{\mathbb{E}}[\delta_{u}(y+z)\delta_{u}(z)|\xi_{u}=\star]$ . The last equality follows from $\sum_{z}\delta_{u}(z)=0$ . Then

[TABLE]

Inequality $(a)$ follows from $(\sum_{i=1}^{n}x_{i})^{2}\leq n\sum_{i}x_{i}^{2}$ , and $(b)$ follows from Cauchy-Schwarz. Lastly, we have $\sum_{y\in\mathbb{Z}_{q}}\operatorname{\mathbb{E}}[\delta_{u}(y)^{2}|\xi_{u}=\star]=\operatorname{\mathbb{E}}[\delta_{u}(\theta_{u})|\xi_{u}=\star]=X.$

Now we plug the estimates of Lemma C.5 in (C.3). Using the fact $0\leq Z_{o}(x)/\sum_{y}Z_{o}(y)\leq 1$ , we obtain

[TABLE]

where $C(q)$ is a constant that depends only on $q$ .

C.2 Proof of Corollary 5.4

We first prove the result concerning the overlap with ${\bm{\theta}}_{0}$ . Let $\sigma\in\mathscr{S}_{q}$ be a fixed permutation. We have

[TABLE]

The last line follows by Cauchy-Schwarz and then Jensen’s inequality. Averaging over $u\in V_{n}$ , applying Jensen’s inequality once more, and then using Theorem D yields the first statement.

Next, let $f:\mathbb{Z}_{q}\mapsto\mathbb{R}$ with $\sum_{x\in\mathbb{Z}_{q}}f(x)=0$ and $\frac{1}{q}\sum_{x\in\mathbb{Z}_{q}}f(x)^{2}=1$ . The loss of $\widehat{{\bm{X}}}^{(l)}$ is

[TABLE]

We have $\operatorname{\mathbb{E}}\|{\bm{X}}_{f}\|_{F}^{2}=\sum_{u,v\in V_{n}}\operatorname{\mathbb{E}}[f(\theta_{u})^{2}f(\theta_{v})^{2}]=\frac{n}{q}\sum_{x\in\mathbb{Z}_{q}}f(x)^{4}+n(n-1)$ . So $\lim\frac{1}{n^{2}}\operatorname{\mathbb{E}}\|{\bm{X}}_{f}\|_{F}^{2}=1$ . On the other hand, since $\sum_{x\in\mathbb{Z}_{q}}f(x)=0$ , we have

[TABLE]

where $\widehat{\delta}_{u,l,G_{n}}(x)=\widehat{\mu}_{G_{n},u,l}(x)-\frac{1}{q}$ , $x\in\mathbb{Z}_{q}$ . On the other hand we have

[TABLE]

We use Cauchy-Schwarz inequality and the fact $\sum_{x\in\mathbb{Z}_{q}}f(x)^{2}=q$ to obtain

[TABLE]

We apply Theorem 5.6 to obtain $\limsup_{l}\limsup_{n}\frac{1}{n^{2}}\operatorname{\mathbb{E}}\big{\langle}\widehat{{\bm{X}}}^{(l)},{\bm{X}}_{f}\big{\rangle}\leq C\|f\|_{\infty}^{2}\varepsilon$ , and this yields the desired result.

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AB 18] Emmanuel Abbe and Enric Boix, An information-percolation bound for spin synchronization on general graphs , ar Xiv:1806.03227 (2018).
2[Abb 17] Emmanuel Abbe, Community detection and stochastic block models: recent developments , The Journal of Machine Learning Research 18 (2017), no. 1, 6446–6531.
3[ABRS 18] Emmanuel Abbe, Enric Boix, Peter Ralli, and Colin Sandon, Graph powering and spectral robustness , ar Xiv:1809.04818 (2018).
4[AKV 02] Noga Alon, Michael Krivelevich, and Van H Vu, On the concentration of eigenvalues of random symmetric matrices , Israel Journal of Mathematics 131 (2002), no. 1, 259–267.
5[AL 07] David Aldous and Russell Lyons, Processes on unimodular random networks , Electron. J. Probab 12 (2007), no. 54, 1454–1508.
6[AMM + 17] Emmanuel Abbe, Laurent Massoulie, Andrea Montanari, Allan Sly, and Nikhil Srivastava, Group synchronization on grids , ar Xiv:1706.08561 (2017).
7[AW 09] Arash A. Amini and Martin J. Wainwright, High-dimensional analysis of semidefinite relaxations for sparse principal components , Annals of Statistics 37 (2009), no. 5B, 2877–2921.
8[BHK + 16] Boaz Barak, Samuel B Hopkins, Jonathan Kelner, Pravesh Kothari, Ankur Moitra, and Aaron Potechin, A nearly tight sum-of-squares lower bound for the planted clique problem , 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), IEEE, 2016, pp. 428–437.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On the computational tractability of statistical estimation

Abstract

1 Introduction

Theorem A**.**

2 Related literature

3 Background

3.1 Further examples

3.2 Local–weak convergence and amenability

Definition 3.1**.**

Definition 3.2**.**

Definition 3.3**.**

Definition 3.4**.**

Definition 3.5**.**

4 Results for asymptotically amenable graphs

Theorem B**.**

Proposition 4.1**.**

Proof of Theorem B.

Proposition 4.2**.**

Corollary 4.3**.**

5 Results for random regular graphs

5.1 Information-theoretic reconstruction: An exhaustive search algorithm

Lemma 5.1**.**

Remark**.**

Theorem C**.**

Corollary 5.2**.**

Corollary 5.3**.**

5.2 Performance of the local algorithm

Theorem D**.**

Corollary 5.4**.**

Remark**.**

6 Proof of Proposition 4.1

6.1 Proof of the ‘local’ statement (4.3)

6.2 Proof of the ‘global’ statement (4.4)

Lemma 6.1**.**

Proof.

Proposition 6.2**.**

Lemma 6.3**.**

Proof.

Lemma 6.4**.**

Proof.

Proof of Proposition 6.2.

Acknowledgements

Appendix A Amenable graphs: some omitted proofs

A.1 Proof of Proposition 4.2

Lemma A.1** (Lemma 3.1 [Mon08]).**

A.2 Proof of Corollary 4.3

Appendix B Information-theoretic reconstruction on random graphs: Technical proofs

B.1 Proof of Lemma 5.1

B.2 Proof of Theorem C: A truncated first moment method

Lemma B.1**.**

Proof.

Lemma B.2**.**

Proof.

Theorem E**.**

Proof.

Lemma B.3**.**

Proof.

B.3 Proof of Corollary 5.3

Appendix C Local algorithms on random graphs: Technical proofs

C.1 Proof of Theorem D

C.1.1 Preliminaries

Lemma C.1**.**

Proof.

Proposition C.2**.**

Corollary C.3**.**

Proof.

C.1.2 Proof of Proposition C.2: Analysis of the recursion on the tree

Lemma C.4**.**

Proof.

Lemma C.5**.**

Proof.

Lemma C.6**.**

C.2 Proof of Corollary 5.4

Theorem A.

Definition 3.1.

Definition 3.2.

Definition 3.3.

Definition 3.4.

Definition 3.5.

Theorem B.

Proposition 4.1.

Proposition 4.2.

Corollary 4.3.

Lemma 5.1.

Remark.

Theorem C.

Corollary 5.2.

Corollary 5.3.

Theorem D.

Corollary 5.4.

Remark.

Lemma 6.1.

Proposition 6.2.

Lemma 6.3.

Lemma 6.4.

Lemma A.1 (Lemma 3.1 [Mon08]).

Lemma B.1.

Lemma B.2.

Theorem E.

Lemma B.3.

Lemma C.1.

Proposition C.2.

Corollary C.3.

Lemma C.4.

Lemma C.5.

Lemma C.6.