Phase Transitions in Edge-Weighted Exponential Random Graphs:   Near-Degeneracy and Universality

Ryan DeMuse; Danielle Larcomb; Mei Yin

arXiv:1706.02163·math.PR·June 10, 2019

Phase Transitions in Edge-Weighted Exponential Random Graphs: Near-Degeneracy and Universality

Ryan DeMuse, Danielle Larcomb, Mei Yin

PDF

TL;DR

This paper extends exponential random graph models to weighted networks by introducing a common distribution for edge weights, addressing limitations of traditional models that only handle simple graphs, and explores properties like near-degeneracy and universality.

Contribution

It proposes a new framework for weighted exponential random graphs with minimal assumptions on edge weight distribution, enabling modeling of more realistic weighted networks.

Findings

01

Identifies conditions for near-degeneracy in weighted models

02

Demonstrates universality properties in the extended framework

03

Provides theoretical insights into weighted network phase transitions

Abstract

Conventionally used exponential random graphs cannot directly model weighted networks as the underlying probability space consists of simple graphs only. Since many substantively important networks are weighted, this limitation is especially problematic. We extend the existing exponential framework by proposing a generic common distribution for the edge weights. Minimal assumptions are placed on the distribution, that is, it is non-degenerate and supported on the unit interval. By doing so, we recognize the essential properties associated with near-degeneracy and universality in edge-weighted exponential random graphs.

Figures4

Click any figure to enlarge with its caption.

Tables3

Table 1. Table 1: Limiting properties of K ( θ ) 𝐾 𝜃 K(\theta) as θ → ± ∞ → 𝜃 plus-or-minus \theta\rightarrow\pm\infty .

\begin{matrix} Limiting Properties of ​ K ​ (θ) & θ ​ limit \\ K ​ (θ) \to - \infty ​ or ​ l < 0 & θ \to - \infty \\ K^{'} ​ (θ) \to 0 & θ \to - \infty \\ K^{''} ​ (θ) \to 0 & θ \to - \infty \\ K ​ (θ) \to \infty & θ \to \infty \\ K^{'} ​ (θ) \to 1 & θ \to \infty \\ K^{''} ​ (θ) \to 0 & θ \to \infty \end{matrix}

Table 2. Table 2: Asymptotic comparison for Bernoulli ( .5 ) .5 (.5) near degeneracy.

$β_{1}$	$β_{2}$	$θ_{opt}$	$u_{opt}$	$\exp (2 β_{1})$	$1 - \exp (- 2 (β_{1} + p β_{2}))$
$- 2$	$- 4$	$- 4.23$	$0.014$	$0.018$
$1$	$1$	$5.99$	$0.998$		$0.998$

Table 3. Table 3: Asymptotic comparison for Uniform ( 0 , 1 ) 0 1 (0,1) near degeneracy.

$β_{1}$	$β_{2}$	$θ_{opt}$	$u_{opt}$	$- 1 / (2 β_{1})$	$1 - 1 / (2 (β_{1} + p β_{2}))$
$- 4$	$- 6$	$- 10.32$	$0.097$	$0.125$
$3$	$2$	$13.40$	$0.925$		$0.929$

Equations98

t (H, G_{n}) = \frac{∣ hom ( H , G _{n} ) ∣}{∣ V ( G _{n} ) ∣ ^{∣ V (H) ∣}} .

t (H, G_{n}) = \frac{∣ hom ( H , G _{n} ) ∣}{∣ V ( G _{n} ) ∣ ^{∣ V (H) ∣}} .

t (H, h) = \int_{[0, 1]^{k}} {i, j} \in E (H) \prod h (x_{i}, x_{j}) d x_{1} \dots d x_{k} .

t (H, h) = \int_{[0, 1]^{k}} {i, j} \in E (H) \prod h (x_{i}, x_{j}) d x_{1} \dots d x_{k} .

d_{□} (f, h) = S, T \subseteq [0, 1] sup \int_{S \times T} (f (x, y) - h (x, y)) d x d y

d_{□} (f, h) = S, T \subseteq [0, 1] sup \int_{S \times T} (f (x, y) - h (x, y)) d x d y

P_{n}^{β} (G_{n}) = exp (n^{2} (β_{1} t (H_{1}, G_{n}) + β_{2} t (H_{2}, G_{n}) - ψ_{n}^{β})) P_{n} (G_{n}),

P_{n}^{β} (G_{n}) = exp (n^{2} (β_{1} t (H_{1}, G_{n}) + β_{2} t (H_{2}, G_{n}) - ψ_{n}^{β})) P_{n} (G_{n}),

ψ_{n}^{β} = \frac{1}{n ^{2}} lo g E_{n} (exp (n^{2} (β_{1} t (H_{1}, G_{n}) + β_{2} t (H_{2}, G_{n})))) .

ψ_{n}^{β} = \frac{1}{n ^{2}} lo g E_{n} (exp (n^{2} (β_{1} t (H_{1}, G_{n}) + β_{2} t (H_{2}, G_{n})))) .

I (u) = θ \in R sup {θ u - K (θ)},

I (u) = θ \in R sup {θ u - K (θ)},

K^{'''} (0) = E (X^{3}) - 3 E (X^{2}) E (X) + 2 (E (X))^{3} = 0.

K^{'''} (0) = E (X^{3}) - 3 E (X^{2}) E (X) + 2 (E (X))^{3} = 0.

I (u)

I (u)

\geq sup {θ \geq 0 sup {θ (u - 1)}, θ \leq 0 sup {θ u}} .

K (- θ)

K (- θ)

= lo g \int e^{- θ} e^{θ y} μ (d y) = - θ + K (θ) .

I (u)

I (u)

= θ (1 - K^{'} (- θ)) - (K (- θ) + θ)

= (- θ) K^{'} (- θ) - K (- θ) = I (1 - u) .

ψ_{\infty}^{β} = u sup (β_{1} u + β_{2} u^{p} - \frac{1}{2} I (u)),

ψ_{\infty}^{β} = u sup (β_{1} u + β_{2} u^{p} - \frac{1}{2} I (u)),

n \to \infty lim δ_{□} (\tilde{h}^{G_{n}}, \tilde{u}) = 0 almost surely,

n \to \infty lim δ_{□} (\tilde{h}^{G_{n}}, \tilde{u}) = 0 almost surely,

K^{'''} (θ) K^{'} (θ) = - (p - 2) (K^{''} (θ))^{2}

K^{'''} (θ) K^{'} (θ) = - (p - 2) (K^{''} (θ))^{2}

L (u; β_{1}, β_{2}) = β_{1} u + β_{2} u^{p} - \frac{1}{2} I (u)

L (u; β_{1}, β_{2}) = β_{1} u + β_{2} u^{p} - \frac{1}{2} I (u)

L^{'} (u) = β_{1} + p β_{2} u^{p - 1} - \frac{1}{2} I^{'} (u),

L^{'} (u) = β_{1} + p β_{2} u^{p - 1} - \frac{1}{2} I^{'} (u),

L^{''} (u) = p (p - 1) β_{2} u^{p - 2} - \frac{1}{2} I^{''} (u) .

L^{''} (u) = p (p - 1) β_{2} u^{p - 2} - \frac{1}{2} I^{''} (u) .

I (u) + K (θ) = θ u,

I (u) + K (θ) = θ u,

u = K^{'} (θ) and I^{''} (u) K^{''} (θ) = 1.

u = K^{'} (θ) and I^{''} (u) K^{''} (θ) = 1.

m (u) = \frac{I ^{''} ( u )}{2 p ( p - 1 ) u ^{p - 2}}

m (u) = \frac{I ^{''} ( u )}{2 p ( p - 1 ) u ^{p - 2}}

n (θ) = 2 p (p - 1) K^{''} (θ) (K^{'} (θ))^{p - 2},

n (θ) = 2 p (p - 1) K^{''} (θ) (K^{'} (θ))^{p - 2},

n \to - \infty lim n (θ) = 0,

n \to - \infty lim n (θ) = 0,

n \to 0 lim n (θ) = 2 p (p - 1) Var (X) (E (X))^{p - 2},

n \to 0 lim n (θ) = 2 p (p - 1) Var (X) (E (X))^{p - 2},

n \to \infty lim n (θ) = 0,

n \to \infty lim n (θ) = 0,

n^{'} (θ) = 2 p (p - 1) (K^{'} (θ))^{p - 3} (K^{'''} (θ) K^{'} (θ) + (p - 2) (K^{''} (θ))^{2})

n^{'} (θ) = 2 p (p - 1) (K^{'} (θ))^{p - 3} (K^{'''} (θ) K^{'} (θ) + (p - 2) (K^{''} (θ))^{2})

f (u) = \frac{u I ^{''} ( u )}{2 ( p - 1 )} - \frac{1}{2} I^{'} (u) .

f (u) = \frac{u I ^{''} ( u )}{2 ( p - 1 )} - \frac{1}{2} I^{'} (u) .

f^{'} (u) = \frac{u I ^{'''} ( u ) - I ^{''} ( u ) ( p - 2 )}{2 ( p - 1 )} = p u^{p - 1} m^{'} (u) .

f^{'} (u) = \frac{u I ^{'''} ( u ) - I ^{''} ( u ) ( p - 2 )}{2 ( p - 1 )} = p u^{p - 1} m^{'} (u) .

f (u) - f (u_{0}) = \int_{u_{0}}^{u} f^{'} (t) d t \geq p u_{0}^{p - 1} \int_{u_{0}}^{u} m^{'} (t) d t = p u_{0}^{p - 1} (m (u) - m (u_{0})),

f (u) - f (u_{0}) = \int_{u_{0}}^{u} f^{'} (t) d t \geq p u_{0}^{p - 1} \int_{u_{0}}^{u} m^{'} (t) d t = p u_{0}^{p - 1} (m (u) - m (u_{0})),

f (E (X)) = \frac{E ( X )}{2 ( p - 1 ) Var ( X )},

f (E (X)) = \frac{E ( X )}{2 ( p - 1 ) Var ( X )},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: Department of Mathematics, University of Denver, Denver, CO 80208, USA

11email: [email protected]

Phase transitions in edge-weighted exponential random graphs: Near-degeneracy and universality

Ryan DeMuse

Danielle Larcomb

Mei Yin Mei Yin’s research was partially supported by NSF grant DMS-1308333.

(Received: date / Accepted: date)

Abstract

Conventionally used exponential random graphs cannot directly model weighted networks as the underlying probability space consists of simple graphs only. Since many substantively important networks are weighted, this limitation is especially problematic. We extend the existing exponential framework by proposing a generic common distribution for the edge weights. Minimal assumptions are placed on the distribution, that is, it is non-degenerate and supported on the unit interval. By doing so, we recognize the essential properties associated with near-degeneracy and universality in edge-weighted exponential random graphs.

Keywords:

Exponential random graphs Legendre duality Phase transitions Near degeneracy and universality

MSC:

05C80 82B26

††journal: Journal of Statistical Physics

1 Introduction

Large networks have become increasingly popular over the last decades, and their modeling and investigation have led to interesting and new ways to apply statistical and analytical methods. Much of the random graph literature has evolved from the famous Erdős-Rényi graph, where edges are joined between vertices independently with the same probability. While the simple formation has attracted significant mathematical interest, this construction lacks the ability to model real world networks, which exhibit many noticeable attributes such as clustering and transitivity. The introduction of exponential random graphs has aided in this pursuit as they are able to capture a wide variety of common network tendencies by representing a complex global structure through a set of tractable local features FS Newman WF . See Besag Besag , Snijders et al. SPRH , Rinaldo et al. RFZ , and Fienberg Fienberg1 Fienberg2 for history and a review of developments.

These rather general models are exponential families of probability distributions over graphs, in which dependence between the random edges is defined through certain finite subgraphs. Inquiries into exponential random graphs have been made on the variational principle of the limiting normalization constant, concentration of the limiting probability distribution, phase transitions, and asymptotic structures. See for example Chatterjee and Varadhan CV , Chatterjee and Diaconis CD1 , Radin and Yin RY , Lubetzky and Zhao LZ1 LZ2 , Radin and Sadun RS1 RS2 , Radin et al. RRS , Kenyon et al. KRRS , Yin Yin2013 , Kenyon and Yin KY , Aristoff and Zhu AZ2 , and Chatterjee and Dembo CD2 . Many of these papers utilize the elegant theory of graph limits as developed by Lovász and coauthors (V.T. Sós, B. Szegedy, C. Borgs, J. Chayes, K. Vesztergombi, …) BCLSV1 BCLSV2 BCLSV3 Lov LS . Building on earlier work of Aldous Aldous1 and Hoover Hoover , the graph limit theory creates a new set of tools for representing and studying the asymptotic behavior of graphs by connecting sequences of graphs $G_{n}$ , which are discrete objects that lie in different probability spaces, to a unified graphon space $\mathcal{W}$ , which is an abstract functional space equipped with a cut metric. Though the theory itself is tailored to dense graphs whose number of edges scales like the square of number of vertices, parallel theories for sparse graphs are likewise emerging. See Benjamini and Schramm BS , Aldous and Steele AS , Aldous and Lyons AL , and Lyons Lyons where the notion of local weak convergence is discussed and the recent works of Borgs et al. BCCZ1 BCCZ2 that are making progress towards enriching the existing $L^{\infty}$ theory of dense graph limits by developing a limiting object for sparse graph sequences based on $L^{p}$ graphons.

Despite their flexibility, conventionally used exponential random graphs suffer from some deficiencies that may hamper their utility to researchers. The major shortcomings are degeneracy problems, a sensitivity to missing data, and an inability to model weighted networks CD . Since the underlying probability space of the standard exponential random graph model consists of simple graphs only yet many substantively important networks arising from a host of applications including socio-econometric data and neuroscience are weighted, this limitation is especially problematic. Consider a social network graph with vertices being people and edges indicating a relationship. We contemplate that family members have stronger relationships with one another than do workplace colleagues. This can be reflected by placing a weight on the edges to demonstrate some prior belief in the strength of connection, with coworkers having low weighted edges and family members having high weighted edges in-between. Properly adjusting the edge weights thus allows the modeling of a broad range of networks, be it consisting of more familial ties or more acquaintances.

An alternative interpretation for simple graphs is such that the edge weights are iid and satisfy a Bernoulli distribution. Following this perspective, Yin Yin2016 extended the exponential framework by putting a generic common distribution on the iid edge weights. After deriving a variational principle for the limiting normalization constant and an associated concentration of measure, an explicit characterization of the asymptotic phase transition was obtained for exponential models with uniformly distributed edge weights. This work expands upon the setting in Yin2016 and places minimal assumptions on the edge-weight distribution, that is, it is non-degenerate and supported on the unit interval. By doing so, we strive to discover universal asymptotic behavior, i.e. behavior that does not depend on the particular edge-weight distribution, for the model in the near-degenerate regions of the parameter space corresponding to where the graph is sparse (almost entirely unconnected) or nearly complete (almost fully connected) CD Handcock Yin .

The rest of this paper is organized as follows. In Section 2 we provide basics of graph limit theory and introduce key features of edge-weighted exponential random graphs. In Section 3 we summarize important properties of Legendre duality between the cumulant generating function and the Cramér rate function for the edge-weight distribution. In Section 4 we show the existence of a first order phase transition curve ending in a second order critical point in general edge-weighted exponential random graph models through a detailed analysis of a maximization problem for the normalization constant. Lastly, in Section 5 we explore the universal and non-universal asymptotics concerning the phase transition.

2 Background

Consider the set $\mathcal{G}_{n}$ of all simple edge-weighted complete labeled graphs $G_{n}$ on $n$ vertices (“simple” means undirected, with no loops or multiple edges), where the edge weights $x_{ij}$ between vertex $i$ and vertex $j$ are iid real random variables satisfying a non-degenerate common distribution $\mu$ that is supported on $[0,1]$ . Any such graph $G_{n}$ , irrespective of the number of vertices, may be represented as an element $h^{G_{n}}$ of a single abstract space $\mathcal{W}$ that consists of all symmetric measurable functions $h(x,y)$ from the unit square $[0,1]^{2}$ into the unit interval $[0,1]$ (referred to as “graph limits” or “graphons”), by setting $h^{G_{n}}(x,y)$ as the edge weight between vertices $\lceil nx\rceil$ and $\lceil ny\rceil$ of $G_{n}$ . The common distribution $\mu$ for the edge weights yields probability measure $\mathbb{P}_{n}$ and the associated expectation $\mathbb{E}_{n}$ on $\mathcal{G}_{n}$ , and further induces probability measure $\mathbb{Q}_{n}$ on the space $\mathcal{W}$ under the graphon representation.

For a finite simple graph $H$ with vertex set $V(H)=[k]=\{1,...,k\}$ and edge set $E(H)$ and a simple graph $G_{n}$ on $n$ vertices, there is a notion of density of graph homomorphisms, denoted by $t(H,G_{n})$ , which indicates the probability that a random vertex map $V(H)\to V(G_{n})$ is edge-preserving,

[TABLE]

For a graphon $h\in\mathcal{W}$ , define the graphon homomorphism density

[TABLE]

Then $t(H,G_{n})=t(H,h^{G_{n}})$ by construction, and we take (2) with $h=h^{G_{n}}$ as the definition of graph homomorphism density $t(H,G_{n})$ for an edge-weighted complete graph $G_{n}$ . This graphon interpretation enables us to capture the notion of convergence in terms of subgraph densities by an explicit “cut distance” on $\mathcal{W}$ :

[TABLE]

for $f,h\in\mathcal{W}$ . Except for a technical complication explained below, a sequence of edge-weighted graphs converges under the cut metric if and only if its homomorphism densities converge for all finite simple graphs, and the limiting homomorphism densities then describe the resulting graphon.

The technical complication is that the topology induced by the cut metric is well defined only up to measure preserving transformations of $[0,1]$ (and up to sets of Lebesgue measure zero), which may be thought of as a vertex relabeling in the context of finite graphs. To tackle this issue, an equivalence relation $\sim$ is introduced in $\mathcal{W}$ . We say that $f\sim h$ if $f(x,y)=h_{\sigma}(x,y):=h(\sigma x,\sigma y)$ for some measure preserving bijection $\sigma$ of $[0,1]$ . Let $\tilde{h}$ (referred to as a “reduced graphon”) denote the equivalence class of $h$ in $(\mathcal{W},d_{\square})$ . Since $d_{\square}$ is invariant under $\sigma$ , one can then define on the resulting quotient space $\widetilde{\mathcal{W}}$ the natural distance $\delta_{\square}$ by $\delta_{\square}(\tilde{f},\tilde{h})=\inf_{\sigma_{1},\sigma_{2}}d_{\square}(f_{\sigma_{1}},h_{\sigma_{2}})$ , where the infimum ranges over all measure preserving bijections $\sigma_{1}$ and $\sigma_{2}$ , making $(\widetilde{\mathcal{W}},\delta_{\square})$ into a metric space. With some abuse of notation we also refer to $\delta_{\square}$ as the “cut distance”. After identifying graphs that are the same after vertex relabeling, the probability measure $\mathbb{P}_{n}$ yields probability measure $\tilde{\mathbb{P}}_{n}$ and the associated expectation $\tilde{\mathbb{E}}_{n}$ (which coincides with $\mathbb{E}_{n}$ ). Correspondingly, the probability measure $\mathbb{Q}_{n}$ induces probability measure $\tilde{\mathbb{Q}}_{n}$ on the space $\widetilde{\mathcal{W}}$ under the measure preserving transformations. The space $(\widetilde{\mathcal{W}},\delta_{\square})$ is a compact space and homomorphism densities $t(H,\cdot)$ are continuous functions on it.

By a $2$ -parameter family of edge-weighted exponential random graphs we mean a family of probability measures $\mathbb{P}_{n}^{\beta}$ on $\mathcal{G}_{n}$ defined by, for $G_{n}\in\mathcal{G}_{n}$ ,

[TABLE]

where $\beta=(\beta_{1},\beta_{2})$ are $2$ real parameters, $H_{1}$ is a single edge, $H_{2}$ is a finite simple graph with $p\geq 2$ edges, $t(H_{i},G_{n})$ is the density of graph homomorphisms, $\mathbb{P}_{n}$ is the probability measure induced by the common distribution $\mu$ for the edge weights, and $\psi_{n}^{\beta}$ is the normalization constant (free energy density),

[TABLE]

Since homomorphism densities $t(H_{i},G_{n})$ are preserved under vertex relabeling, the probability measure $\tilde{\mathbb{P}}_{n}^{\beta}$ and the associated expectation $\tilde{\mathbb{E}}_{n}^{\beta}$ (which coincides with $\mathbb{E}_{n}^{\beta}$ ) may likewise be defined.

Being exponential families with bounded support, one might expect exponential random graph models to enjoy a rather basic asymptotic form, though in fact, virtually all these models are highly nonstandard as $n$ increases. The $2$ -parameter edge-weighted exponential random graph models are simpler than their $k$ -parameter extensions but nevertheless exhibit a wealth of non-trivial characteristics and capture a variety of interesting features displayed by large networks. Furthermore, the relative simplicity provides insight into the expressive power of the exponential construction. In statistical physics, we refer to $\beta_{1}$ as the particle parameter and $\beta_{2}$ as the energy parameter. Accordingly, the exponential model (4) is said to be “attractive” if $\beta_{2}$ is positive and “repulsive” if $\beta_{2}$ is negative. In this paper we will concentrate on “attractive” $2$ -parameter models. The interest in these models is well justified. Consider the friendship graph for example, where the edge weights between different vertex pairs measure the strength of mutual friendship. Take $H_{1}$ an edge and $H_{2}$ a triangle. Since a friend of a friend is likely also a friend, the influence of a triangle that assesses the bond of a $3$ -way friendship should be emphasized, and this corresponds to taking $\beta_{2}\geq 0$ . The edge-triangle model thus captures transitivity when $n$ is finite, but this transitivity is gradually lost when $n$ tends to infinity in the sense that the model produces a graph that looks similar to an Erdős-Rényi graph with respect to the cut metric (see detailed discussions in Sections 4 and 5).

3 Legendre transform and duality

In this section we present properties of the cumulant generating function $K(\theta)$ and the Cramér rate function $I(u)$ for the edge-weight distribution $\mu$ relevant to our investigation. We will see that $K(\theta)$ is convex on $\operatorname{\mathbb{R}}$ , which allows the application of the Legendre transform. Let $I:A\to\operatorname{\mathbb{R}}$ be the Legendre transform of $K$ given by

[TABLE]

where $A$ , the domain of $I$ , consists of all $u$ so that $I(u)<\infty$ . Note that in large deviation theory, $I$ is commonly referred to as the Cramér conjugate rate function for the distribution $\mu$ . It follows from theorems proved in Chapter 2: Analytic Properties of Brown that the Legendre transform connecting $K$ and $I$ is an involution, $I$ is smooth and strictly convex everywhere it is defined, and there is a 1-1 relationship between $K$ and $I$ . Lemma 1 and Proposition 1 then discuss properties of $K(\theta)$ and $I(u)$ under the additional assumption that $\mu$ is symmetric. These properties will be useful in Section 5 when we explore universality in edge-weighted exponential random graphs.

Lemma 1

Consider a non-degenerate probability measure $\mu$ supported on $[0,1]$ . Let $K(\theta)$ be the associated cumulant generating function. If $\mu$ is symmetric about the line $u=1/2$ , then $K^{\prime\prime\prime}(0)K^{\prime}(0)+(p-2)\left(K^{\prime\prime}(0)\right)^{2}\geq 0$ , and equality is obtained only when $p=2$ .

Proof

Let $X$ be a random variable distributed according to $\mu$ . By symmetry, $\mathbb{E}(X)=1/2$ and $\mathbb{E}(X^{3})=3\mathbb{E}(X^{2})/2-1/4$ . This implies that $K^{\prime}(0)=\mathbb{E}(X)=1/2$ and $K^{\prime\prime}(0)=\mathbb{E}(X^{2})-\left(\mathbb{E}(X)\right)^{2}=\mathbb{E}(X^{2})-1/4$ . Also,

[TABLE]

The claim thus follows.

Lemma 2

Consider a non-degenerate probability measure $\mu$ supported on $[0,1]$ . Let $I(u)$ be the associated Cramér rate function (6). Then the domain of $I$ is a subset of $[0,1]$ .

Proof

Since $\mu$ is supported on $\left[0,1\right]$ , we have $0\leq K(\theta)\leq\theta$ if $\theta\geq 0$ , and $\theta\leq K(\theta)\leq 0$ if $\theta\leq 0$ . This gives

[TABLE]

If $u>1$ then $\sup_{\theta\geq 0}\left\{{\theta}\left(u-1\right)\right\}=\infty$ and thus $I(u)$ is not finite. Similarly, if $u<0$ then $\sup_{\theta\leq 0}\left\{{\theta}u\right\}=\infty$ and thus $I(u)$ is not finite. The conclusion readily follows.

Analyzing properties of $K(\theta)$ and $I(u)$ in detail will give a stronger conclusion than Lemma 2. We recognize that the cumulant generating function $K(\theta)$ satisfies $K(0)=0$ , $K^{\prime}(0)=\mathbb{E}(X)$ , and $K^{\prime\prime}(0)=\operatorname{\mathrm{Var}}(X)$ , where $X$ is a random variable distributed according to $\mu$ . See Table 1 for important limiting properties of $K(\theta)$ as $\theta\rightarrow\pm\infty$ . By Legendre duality, every $u\in(0,1)$ uniquely corresponds to a $\theta\in\left(-\infty,\infty\right)$ , with $K^{\prime}(\theta)=u$ and $I^{\prime}(u)=\theta$ . This implies that $I(\mathbb{E}(X))=I^{\prime}(\mathbb{E}(X))=0$ , and $I(u)$ is decreasing on $(0,\mathbb{E}(X))$ and increasing on $(\mathbb{E}(X),1)$ . We also note that $I(0)$ and $I(1)$ , depending on the probability distribution $\mu$ , may be either finite or grow unbounded. In the former case, the domain of $I$ is $[0,1]$ (as for Bernoulli $(.5)$ ). In the latter case, the domain of $I$ is $(0,1)$ (as for Uniform $(0,1)$ ).

Proposition 1

Consider a non-degenerate probability measure $\mu$ supported on $[0,1]$ . Let $I(u)$ be the associated Cramér rate function (6). If $\mu$ is symmetric about the line $u=1/2$ , then $I(u)$ is also symmetric about the line $u=1/2$ .

Proof

Let $\theta\in\operatorname{\mathbb{R}}$ . Under the symmetry assumption, we will show, by a simple change of variable $x=1-y$ , that $K(-\theta)=-\theta+K(\theta)$ .

[TABLE]

Let $u\in(0,1)$ . Following Legendre duality, $u=K^{\prime}(\theta)$ for a unique $\theta$ . By (9), this implies that $1-u=1-K^{\prime}(\theta)=K^{\prime}(-\theta)$ , i.e., $1-u$ and $-\theta$ are unique duals of each other. We compute

[TABLE]

This verifies our claim.

4 Maximization analysis

In this section we demonstrate the existence of first order phase transitions in general edge-weighted exponential random graphs. Our main results are Theorem 4.3 and the consequent Corollary 1. In the standard statistical physics literature, phase transition is often associated with loss of analyticity in the normalization constant, which gives rise to discontinuities in the observed graph statistics. In the vicinity of a phase transition, even a tiny change in some local feature can result in a dramatic change of the entire system.

Definition 1

A phase is a connected region of the parameter space $\{\beta\}$ , maximal for the condition that the limiting normalization constant $\psi_{\infty}^{\beta}:=\lim_{n\rightarrow\infty}\psi_{n}^{\beta}$ is analytic. There is a $j$ th-order transition at a boundary point of a phase if at least one $j$ th-order partial derivative of $\psi_{\infty}^{\beta}$ is discontinuous there, while all lower order derivatives are continuous.

Following this philosophy, we will make use of two theorems from Yin2016 , which connect the occurrence of an asymptotic phase transition in our model with the solution of a certain maximization problem for the limiting normalization constant.

Theorem 4.1 (Theorem 3.4 in Yin2016 )

Consider a general $2$ -parameter exponential random graph model (4). Suppose $\beta_{2}$ is non-negative. Then the limiting normalization constant $\psi_{\infty}^{\beta}$ exists, and is given by

[TABLE]

where $H_{2}$ is a simple graph with $p\geq 2$ edges, $I$ is the Cramér rate function (6), and the supremum is taken over all $u$ in the domain of $I$ , i.e., where $I<\infty$ .

Theorem 4.2 (Theorem 3.5 in Yin2016 )

Let $G_{n}$ be an exponential random graph drawn from (4). Suppose $\beta_{2}$ is non-negative. Then $G_{n}$ behaves like an Erdős-Rényi graph $G(n,u)$ in the large $n$ limit:

[TABLE]

where $u$ is picked randomly from the set $U$ of maximizers of (11).

To be more precise, Theorems 4.1 and 4.2 indicate that a typical graph drawn from the exponential random graph model is weakly pseudorandom BBS . Weakly pseudorandomness means that with exponentially high probability, a sampled graph satisfies a number of equivalent properties such as large spectral gap and correct number of all subgraph counts that make it very similar to an Erdős-Rényi graph. Some authors have delved deeper into this “asymptotically equivalent” phenomenon. Mukherjee M considered the two star model in M and found that though the model looks like an Erdős-Rényi mixture in cut distance, the same convergence does not go through in total variation. This says that despite being very close to Erdős-Rényi, a graph sampled from the exponential distribution is not exactly Erdős-Rényi. In the case of edge-triangle model, Radin and Sadun RS2 argued that the two-parameter to one-parameter reduction and the loss of information is essentially due to the inequivalence of grand canonical and microcanonical ensembles of the exponential model in the asymptotic regime. From a practical perspective, however, the Erdős-Rényi approximation for the exponential random graph is already good enough, and we may simply picture an exponential random graph as an Erdős-Rényi graph in the large graph “attractive” limit CD1 .

A significant part of computing phase boundaries for the $2$ -parameter exponential model is then a detailed analysis of a calculus problem coupled with probability estimates. However, as straightforward as it sounds, since the exact form of the Cramér rate function $I$ is not readily obtainable for a generic edge-weight distribution $\mu$ , getting a clear picture of the asymptotic phase structure is not that easy and various tricks, especially the duality principle for the Legendre transform, need to be employed ZRM . We note that our mechanism for $2$ -parameter models may be further generalized to a $k$ -parameter setting, and the crucial idea is to minimize the effect of the ordered parameters on the limiting normalization constant one by one. See Yin2013 for an illustration of this procedure in the standard exponential random graph model (where $\mu$ is Bernoulli $(.5)$ ).

Assumption Let $p$ be the number of edges in $H_{2}$ . Denote by $K(\theta)$ the cumulant generating function associated with the probability measure $\mu$ . We place a technical assumption:

[TABLE]

admits only one zero on $\operatorname{\mathbb{R}}$ .

We remark that this requirement on $\mu$ , which is satisfied by many common distributions including Bernoulli $(.5)$ and Uniform $(0,1)$ etc., is just a technicality that guarantees the existence of a unique phase transition curve. Without this assumption, there may be more than one phase transition curve. Still, all phase transition curves display the same asymptotical behavior as described in (26), and all graph samples drawn from the “attractive” region of the parameter space are approximately Erdős-Rényi (but with varying densities). The parameter space therefore consists of a single (Erdős-Rényi) phase with first order phase transition(s) across one (or more) curves and second order phase transition(s) along the boundaries, and the transitions correspond to a change in density of the Erdős-Rényi graph.

The meaning of a phase transition in the exponential model thus deserves some careful re-examination. As will be shown in Theorem 4.3, there are curves approaching the phase transition curve from either side along which the corresponding weakly pseudorandom Erdős-Rényi distribution stays constant, and a jump in the Erdős-Rényi parameter $u$ occurs only when the phase transition curve is crossed. This implies that asymptotically the state of the network (represented by $u$ ) does not have a one-to-one correspondence with the associated exponential parameter $\beta$ . (The same defect was observed by Chatterjee and Diaconis CD1 in the unweighted situation.) Some intricate differences between the exponential model and the related Erdős-Rényi model are presented in Sections 4 and 5, particularly through the calculations after Theorem 5.4. The last equation (43) offers a possible way of distinguishing among “equivalent” exponential parameters $\beta$ , since same Erdős-Rényi parameter $u$ but different model parameter $\beta_{2}$ lead to different limiting normalization constant in the exponential model, which encodes important asymptotic information about the system.

Given an observed network that one wishes to model using an exponential random graph model, there may be many parameter values yielding the same weakly pseudorandom Erdős-Rényi distribution, and practitioners need to determine what is a best choice. Ideally, those parameters would generate a model whose measurements from simulated realizations reflect the observed network as accurately as possible in every aspect (not just the correct number of subgraph counts as determined by the Erdős-Rényi parameter). Restrictions on the run time of the data collection process may be further imposed. These practical considerations have led to continued interest and advances both theoretically and experimentally in improving goodness of fit and parameter learning Handcock Hunter and developing better model specifications SPRH . For a general principle, good models should produce networks that are structurally similar to the observed network using few but effective parameters, while bad models produce networks that bear little resemblance to the observed network using many unnecessary parameters.

Theorem 4.3

Suppose the common distribution $\mu$ for the edge weights is supported on $[0,1]$ and non-degenerate. For any allowed $H_{2}$ , the limiting normalization constant $\psi_{\infty}^{\beta}$ of (4) is analytic at all $(\beta_{1},\beta_{2})$ in the upper half-plane $(\beta_{2}\geq 0)$ except on a certain decreasing curve $\beta_{2}=r(\beta_{1})$ which includes the endpoint $(\beta_{1}^{c},\beta_{2}^{c})$ . The derivatives $\frac{\partial}{\partial\beta_{1}}\psi_{\infty}^{\beta}$ and $\frac{\partial}{\partial\beta_{2}}\psi_{\infty}^{\beta}$ have (jump) discontinuities across the curve, except at the end point where all the second derivatives $\frac{\partial^{2}}{\partial\beta_{1}^{2}}\psi_{\infty}^{\beta}$ , $\frac{\partial^{2}}{\partial\beta_{1}\partial\beta_{2}}\psi_{\infty}^{\beta}$ and $\frac{\partial^{2}}{\partial\beta_{2}^{2}}\psi_{\infty}^{\beta}$ diverge.

Corollary 1

For any allowed $H_{2}$ , the parameter space $\{(\beta_{1},\beta_{2}):\beta_{2}\geq 0\}$ consists of a single phase with a first order phase transition across the indicated curve $\beta_{2}=r(\beta_{1})$ and a second order phase transition at the critical point $(\beta_{1}^{c},\beta_{2}^{c})$ , qualitatively like the gas/liquid transition in equilibrium materials.

Proof of Theorem 4.3 Let $p$ be the number of edges in $H_{2}$ . Denote by $I(u)$ the Cramér rate function associated with the probability measure $\mu$ . Define

[TABLE]

for $u\in[0,1]$ . We consider the maximization problem for $L(u;\beta_{1},\beta_{2})$ on the interval $[0,1]$ , where $-\infty<\beta_{1}<\infty$ and $0\leq\beta_{2}<\infty$ are parameters. We note that by Theorem 4.1, the supremum should actually be taken over the domain of $I$ , which might differ from $[0,1]$ at the endpoints from the discussion following Lemma 2. However, when the domain of $I$ does not include [math] (or $1$ ), $L(0)$ (or $L(1)$ ) is negative infinity and so can not be the maximum. To locate the maximizers of $L(u)$ , we examine the properties of $L^{\prime}(u)$ and $L^{\prime\prime}(u)$ ,

[TABLE]

Utilizing the duality principle for the Legendre transform between $I(u)$ and $K(\theta)$ , we first analyze properties of $L^{\prime\prime}(u)$ on the interval $(0,1)$ . As a consequence of the Legendre transform,

[TABLE]

where $\theta$ and $u$ are unique duals of each other. Taking derivatives, we find that

[TABLE]

Consider the function

[TABLE]

on $(0,1)$ . By (17), we may analyze the properties of $m(u)$ through the function

[TABLE]

where $\theta\in\operatorname{\mathbb{R}}$ and $m(u)n(\theta)=1$ . From the discussion following Lemma 2, we recognize that

[TABLE]

where $X$ is a random variable distributed according to $\mu$ . Since

[TABLE]

and $K^{\prime}(\theta)>0$ always, under Assumption there exists a unique $\theta_{0}$ such that $n^{\prime}(\theta_{0})=0$ . This unique global maximizer $\theta_{0}$ for $n(\theta)$ corresponds to a unique global minimizer for $m(u)$ , which we denote by $u_{0}$ . Using duality, $m(u)>0$ for all $u\in(0,1)$ and grows unbounded on both ends. For $\beta_{2}\leq m(u_{0})$ , $L^{\prime\prime}(u)\leq 0$ on $(0,1)$ . For $\beta_{2}>m(u_{0})$ , $L^{\prime\prime}(u)<0$ for $0<u<u_{1}$ and $u_{2}<u<1$ and $L^{\prime\prime}(u)>0$ for $u_{1}<u<u_{2}$ , where the transition points $u_{1}$ and $u_{2}$ satisfy $L^{\prime\prime}(u_{1})=L^{\prime\prime}(u_{2})=0$ . Sign properties of $L^{\prime\prime}(u)$ translate to monotonicity properties of $L^{\prime}(u)$ over $(0,1)$ . For $\beta_{2}\leq m(u_{0})$ , $L^{\prime}(u)$ is decreasing over $(0,1)$ . For $\beta_{2}>m(u_{0})$ , $L^{\prime}(u)$ is decreasing from [math] to $u_{1}$ , increasing from $u_{1}$ to $u_{2}$ , and decreasing from $u_{2}$ to $1$ . See Figure 2 for an illustrative plot of $n(\theta)$ and $m(u)$ .

The analytic properties of $L^{\prime\prime}(u)$ and $L^{\prime}(u)$ entail analytic properties of $L(u)$ on the interval $[0,1]$ . Utilizing the duality of the Legendre transform (16) (17), $I(u)$ is a smooth convex function, $I^{\prime}(0)=-\infty$ and $I^{\prime}(1)=\infty$ . Therefore $L^{\prime}(0)=\infty$ and $L^{\prime}(1)=-\infty$ , so $L(u)$ cannot be maximized at $u=0$ or $u=1$ . For $\beta_{2}\leq m(u_{0})$ , $L(u)$ is decreasing from $\infty$ at 0 to $-\infty$ at 1 passing the $u$ -axis only once. This intercept, which we denote by $u^{\ast}$ , is the unique global maximizer for $L(u)$ . Now consider $\beta_{2}>m(u_{0})$ . If $L^{\prime}(u_{1})\geq 0$ , then $L^{\prime}(u)$ has a unique zero greater than $u_{2}$ and so $L(u)$ has a unique global maximizer at $u^{\ast}>u_{2}$ . If $L^{\prime}(u_{2})\leq 0$ , then $L^{\prime}(u)$ has a unique zero less than $u_{1}$ and so $L(u)$ has a unique global maximizer at $u^{\ast}<u_{1}$ . Lastly, suppose that $L^{\prime}(u_{1})<0<L^{\prime}(u_{2})$ . Then $L(u)$ has two local maximizers. Denote them by $u^{\ast}_{1}$ and $u^{\ast}_{2}$ , with $0<u^{\ast}_{1}<u_{1}<u_{0}<u_{2}<u^{\ast}_{2}<1$ . See Figure 3 for an illustrative plot of $L(u)$ in this case.

Define

[TABLE]

Using $m(u_{1})=m(u_{2})=\beta_{2}$ (18), $L^{\prime}(u_{1})=\beta_{1}+f(u_{1})$ and $L^{\prime}(u_{2})=\beta_{1}+f(u_{2})$ . We compute

[TABLE]

As a consequence of the relation between $f^{\prime}$ and $m^{\prime}$ , following the previous analysis for $m$ , $f$ is decreasing on $(0,u_{0})$ and increasing on $(u_{0},1)$ . We check that similarly as $m$ , $f$ grows unbounded on both ends. Taking $u\rightarrow 0$ corresponds to taking $\theta\rightarrow-\infty$ in the dual space (16)(17), and the divergence is clear from the discussion following Lemma 2. To see that $f(u)$ diverges as $u\rightarrow 1$ , we utilize (23). By the fundamental theorem of calculus,

[TABLE]

and grows to infinity as $u$ approaches $1$ . Let $X$ be a random variable distributed according to $\mu$ , we note some nice formulas for $f$ and $m$ for future reference:

[TABLE]

In order for $L^{\prime}(u_{1})<0$ , we must have $\beta_{1}<-f(u_{1})$ . Since $f$ attains an absolute minimum at $u_{0}$ , $f(u_{1})>f(u_{0})$ , and then $\beta_{1}<-f(u_{0})$ . The only possible region in the $(\beta_{1},\beta_{2})$ plane where $L^{\prime}(u_{1})<0<L^{\prime}(u_{2})$ is thus bounded by $\beta_{1}<-f(u_{0})$ and $\beta_{2}>m(u_{0})$ . Denote these two critical values for $\beta_{1}$ and $\beta_{2}$ by $\beta_{1}^{c}:=-f(u_{0})$ and $\beta_{2}^{c}:=m(u_{0})$ .

Recall that $u_{1}<u_{0}<u_{2}$ . By monotonicity of $f(u)$ on the intervals $(0,u_{0})$ and $(u_{0},1)$ , there exist continuous functions $a(\beta_{1})$ and $b(\beta_{1})$ of $\beta_{1}$ , such that $L^{\prime}(u_{1})<0$ for $u_{1}>a(\beta_{1})$ and $L^{\prime}(u_{2})>0$ for $u_{2}>b(\beta_{1})$ . As $\beta_{1}\rightarrow-\infty$ , $a(\beta_{1})\rightarrow 0$ and $b(\beta_{1})\rightarrow 1$ . $a(\beta_{1})$ is an increasing function of $\beta_{1}$ , whereas $b(\beta_{1})$ is a decreasing function, and they satisfy $f(a(\beta_{1}))=f(b(\beta_{1}))=-\beta_{1}$ . The restrictions on $u_{1}$ and $u_{2}$ yield restrictions on $\beta_{2}$ , and we have $L^{\prime}(u_{1})<0$ for $\beta_{2}<m(a(\beta_{1}))$ and $L^{\prime}(u_{2})>0$ for $\beta_{2}>m(b(\beta_{1}))$ . As $\beta_{1}\rightarrow-\infty$ , $m(a(\beta_{1}))\rightarrow\infty$ and $m(b(\beta_{1}))\rightarrow\infty$ . $m(a(\beta_{1}))$ and $m(b(\beta_{1}))$ are both decreasing functions of $\beta_{1}$ , and they satisfy $L^{\prime}(u_{1})=0$ when $\beta_{2}=m(a(\beta_{1}))$ and $L^{\prime}(u_{2})=0$ when $\beta_{2}=m(b(\beta_{1}))$ . As $L^{\prime}(u_{2})>L^{\prime}(u_{1})$ for every $(\beta_{1},\beta_{2})$ , the curve $m(b(\beta_{1}))$ lies below the curve $m(a(\beta_{1}))$ , and together they generate the bounding curves of the $V$ -shaped region in the $(\beta_{1},\beta_{2})$ plane with corner point $(\beta_{1}^{c},\beta_{2}^{c})$ where two local maximizers exist for $L(u)$ . By (23), for sufficiently negative values of $\beta_{1}$ , $f(a(\beta_{1}))<m(a(\beta_{1}))$ and $f(b(\beta_{1}))>m(b(\beta_{1}))$ , so the straight line $\beta_{1}=-\beta_{2}$ lies within this region.

Fix an arbitrary $\beta_{1}<\beta_{1}^{c}$ . Then $L^{\prime}(u)$ shifts upward as $\beta_{2}$ increases and downward as $\beta_{2}$ decreases. As a result, as $\beta_{2}$ gets large, the positive area bounded by the curve $L^{\prime}(u)$ increases, whereas the negative area decreases. By the fundamental theorem of calculus, the difference between the positive and negative areas is the difference between $L(u_{2}^{*})$ and $L(u_{1}^{*})$ , which goes from negative ( $L^{\prime}(u_{2})=0$ , $u_{1}^{*}$ is the global maximizer) to positive ( $L^{\prime}(u_{1})=0$ , $u_{2}^{*}$ is the global maximizer) as $\beta_{2}$ goes from $m(b(\beta_{1}))$ to $m(a(\beta_{1}))$ . Thus there must be a unique $\beta_{2}$ : $m(b(\beta_{1}))<\beta_{2}<m(a(\beta_{1}))$ such that $u_{1}^{*}$ and $u_{2}^{*}$ are both global maximizers, and we denote this $\beta_{2}$ by $r(\beta_{1})$ . The parameter values of $(\beta_{1},r(\beta_{1}))$ are exactly the ones for which positive and negative areas bounded by $L^{\prime}(u)$ equal each other. An increase in $\beta_{1}$ induces an upward shift of $L^{\prime}(u)$ , and may be balanced by a decrease in $\beta_{2}$ . Similarly, a decrease in $\beta_{1}$ induces a downward shift of $L^{\prime}(u)$ , and may be balanced by an increase in $\beta_{2}$ . This justifies that $r(\beta_{1})$ is monotonically decreasing in $\beta_{1}$ . See Figure 1. Here we let $X$ be a random variable distributed according to Beta $(2,2)$ , then $\mathbb{E}(X)=1/2$ and $\operatorname{\mathrm{Var}}(X)=1/20$ . By Lemma 1, $\theta_{0}=0$ and $u_{0}=\mathbb{E}(X)=1/2$ , which by (25) gives $(\beta_{1}^{c},\beta_{2}^{c})=(-5,5)$ . Also see Figure 1 in RY and Figure 1 in Yin2016 for related phase transition plots when the edge-weight distribution $\mu$ is respectively Bernoulli $(.5)$ and Uniform $(0,1)$ .

The rest of the proof follows as in the proof of the corresponding result (Theorem 2.1) in Radin and Yin RY , where some probability estimates were used. A (jump) discontinuity in the first derivatives of $\psi_{\infty}^{\beta}$ across the curve $\beta_{2}=r(\beta_{1})$ indicates a discontinuity in the expected local densities, while the divergence of the second derivatives of $\psi_{\infty}^{\beta}$ at the critical point $(\beta_{1}^{c},\beta_{2}^{c})$ implies that the covariances of the local densities go to zero more slowly than $1/n^{2}$ . We omit the proof details.

Remark 1

The maximization problem (11) is solved at a unique value $u^{*}$ off the phase transition curve $\beta_{2}=r(\beta_{1})$ , and at two values $u_{1}^{*}$ and $u_{2}^{*}$ along the curve. As $\beta_{1}\rightarrow-\infty$ (resp. $\beta_{2}\rightarrow\infty$ ), $u_{1}^{*}\rightarrow 0$ and $u_{2}^{*}\rightarrow 1$ . The jump from $u_{1}^{*}$ to $u_{2}^{*}$ is quite noticeable even for small parameter values of $\beta$ . For example, taking $p=2$ , $\beta_{1}=-8$ , and $\beta_{2}=8$ in Beta $(2,2)$ , numerical computations yield that $u_{1}^{*}\approx 0.165$ and $u_{2}^{*}\approx 0.835$ .

5 Universal asymptotics

In this section we examine near degeneracy and universality in general edge-weighted exponential random graphs. All our findings in this section are derived based on the assumption that the non-degenerate probability measure $\mu$ for the edge weights is symmetric about the line $u=1/2$ . We remark that near degeneracy and universality are expected even when the edge weights are not symmetrically distributed, except that the universal straight line gets shifted vertically from $\beta_{2}=-\beta_{1}$ .

Proposition 2

Consider a non-degenerate probability measure $\mu$ supported on $[0,1]$ and symmetric about the line $u=1/2$ . Take $H_{1}$ a single edge and $H_{2}$ a finite simple graph with $p\geq 2$ edges. The phase transition curve $\beta_{2}=r(\beta_{1})$ lies above the straight line $\beta_{2}=-\beta_{1}$ when $p\geq 3$ , and is exactly the portion of the straight line $\beta_{2}=-\beta_{1}$ ( $\beta_{1}\leq-1/(4\operatorname{\mathrm{Var}}(X))$ when $p=2$ . Here $X$ is a random variable distributed according to $\mu$ .

Proof

From the proof of Theorem 4.3, there are two global maximizers $u_{1}^{*}$ and $u_{2}^{*}$ for $L(u)$ along the phase transition curve $\beta_{2}=r(\beta_{1})$ , $0<u_{1}^{*}<u_{0}<u_{2}^{*}<1$ , where $u_{0}$ is the unique global minimizer for $m(u)$ (18). By Lemma 1, $u_{0}=1/2$ when $p=2$ and $u_{0}>1/2$ when $p>2$ . Furthermore, the $y$ -coordinate $\beta_{2}^{c}$ of the critical point $(\beta_{1}^{c},\beta_{2}^{c})=(-f(u_{0}),m(u_{0}))$ is always positive. On the straight line $\beta_{1}+\beta_{2}=0$ , we rewrite $L(u)=\beta_{1}(u-u^{p})-I(u)/2$ . By Proposition 1, $I(u)$ is symmetric about the line $u=1/2$ . First suppose $p=2$ . Since $I(u)$ and $u-u^{2}$ are both symmetric, two global maximizers $u_{1}^{*}$ and $u_{2}^{*}$ exist for $L(u)$ and $(-f(u_{0}),m(u_{0}))=\left(-1/(4\operatorname{\mathrm{Var}}(X)),1/(4\operatorname{\mathrm{Var}}(X))\right)$ by (25). Next consider the generic case $p\geq 3$ . Analytical calculations give that $u-u^{p}<(1-u)-(1-u)^{p}$ for $0<u<1/2$ . Since $I(u)$ is symmetric, this says that for $\beta_{1}<0$ (resp. $\beta_{2}>0$ ), the global maximizer $u^{*}$ of $L(u)$ satisfies $u^{*}\leq 1/2$ and so must be $u_{1}^{*}$ . The conclusion readily follows.

Proposition 3

Consider a non-degenerate probability measure $\mu$ supported on $[0,1]$ and symmetric about the line $u=1/2$ . Assume the associated Cramér rate function (6) is bounded on $[0,1]$ (i.e. $I(0)=I(1)$ is finite). Take $H_{1}$ a single edge and $H_{2}$ a finite simple graph with $p\geq 2$ edges. The phase transition curve $\beta_{2}=r(\beta_{1})$ displays a universal asymptotic behavior as $\beta_{1}\to-\infty$ , specifically,

[TABLE]

Proof

Let $\beta_{2}=-\beta_{1}+\delta$ with $\delta>0$ fixed. Define $F(u;\beta_{1})=\beta_{1}(u-u^{p})$ and $G(u;\delta)=\delta u^{p}-I(u)/2$ so that $L(u;\beta_{1},\beta_{2})=F(u;\beta_{1})+G(u;\delta)$ by (14). We will show, for sufficiently negative $\beta_{1}$ , that the global maximizer $u^{*}$ of $L(u)$ equals $u_{2}^{*}$ . Together with Proposition 2, this implies that for these $\beta_{1}$ , $-\beta_{1}\leq r(\beta_{1})\leq-\beta_{1}+\delta$ , which will prove the desired limit.

Under our assumption, $-I(u)$ is a continuous symmetric function that increases on $(0,1/2)$ and decreases on $(1/2,1)$ , with a maximum attained at $u=1/2$ and $-I(1/2)=0$ . Denote by $C:=-I(0)/2=-I(1)/2$ so that $C$ is finite and negative and $G(0)=C$ . Recall that $0<u_{1}^{*}<u_{0}<u_{2}^{*}<1$ , where $u_{1}^{*}$ and $u_{2}^{*}$ are two local maximizers for $L(u)$ and $u_{0}\geq 1/2$ is the unique global minimizer for $m(u)$ (18) that does not depend on $\beta_{1}$ and $\beta_{2}$ . Rigorously, it may be that only one local maximizer $u_{1}^{*}$ or $u_{2}^{*}$ exist for $L(u)$ , but this does not affect our argument below. From the continuity and boundedness of $G$ on $[0,1]$ , there exists $\eta\in(0,1-u_{0})$ such that if $0\leq u<\eta$ then $G(u)-C<\delta/2$ . Since $u-u^{p}=u(1-u^{p-1})>0$ on $(0,1)$ and vanishes at the endpoints [math] and $1$ , there exists $\beta<0$ such that for all $\beta_{1}<\beta$ and $u\in[\eta,1-\eta]$ , $F(u)<C-\delta$ and therefore $L(u)<C-\delta+G(u)<C=L(0)$ , so $u^{*}\in[0,\eta)\cup(1-\eta,1]$ . Similarly, using that $F(u)\leq 0$ for all $\beta_{1}<0$ and all $u\in[0,\eta)$ , we have $L(u)\leq G(u)<C+\delta/2<C+\delta=L(1)$ so $u^{*}\in(1-\eta,1]$ . Since $u_{1}^{*}<u_{0}<1-\eta$ , this says that $u^{*}=u_{2}^{*}$ .

Propositions 2 and 3 have advanced our understanding of phase transitions in edge-weighted exponential random graphs, yet some fundamental questions remain unanswered. As explained in Section 4, a typical graph sampled from the exponential model looks like an Erdős-Rényi graph $G(n,u)$ in the large $n$ limit, where the asymptotic edge presence probability $u(\beta_{1},\beta_{2})\rightarrow 0$ or $1$ is prescribed according to the maximization problem (11). However, the speed of $u$ towards these two degenerate states is not at all clear. When a typical graph is sparse ( $u\rightarrow 0$ ), how sparse is it? When a typical graph is nearly complete $(u\rightarrow 1)$ , how complete is it? Can we give an explicit characterization of the near degenerate graph structure as a function of the parameters? The following Theorems 5.1 and 5.2 are dedicated towards these goals. Theorem 5.1 shows that $\theta$ , the dual of the Erdős-Rényi parameter $u$ , displays universal asymptotic behavior in the sparse region of the parameter space ( $\beta_{1}<-\beta_{2}$ and $\beta_{2}\geq 0$ ) whereas $u$ itself depends on the specific edge-weight distribution $\mu$ . Theorem 5.2 provides a corresponding result in the nearly complete region of the parameter space ( $\beta_{1}>-\beta_{2}$ and $\beta_{2}\geq 0$ ), showing that the dual $\theta$ again displays universal asymptotic behavior whereas the Erdős-Rényi parameter $u$ still depends on the edge-weight distribution $\mu$ .

Theorem 5.1

Consider a non-degenerate probability measure $\mu$ supported on $[0,1]$ and symmetric about the line $u=1/2$ . Take $H_{1}$ a single edge and $H_{2}$ a finite simple graph with $p\geq 2$ edges. Let $\beta_{1}<-\beta_{2}$ and $\beta_{2}\geq 0$ . For large $n$ and $(\beta_{1},\beta_{2})$ sufficiently far away from the origin, a typical graph drawn from the model looks like an Erdős-Rényi graph $G(n,u)$ , where the edge presence probability $u$ depends on the distribution $\mu$ , but its dual $\theta$ universally satisfies $\theta\asymp 2\beta_{1}$ .

Proof

Let $\beta_{1}=a\beta_{2}$ with $a<-1$ . Resorting to Legendre duality, (11) gives a condition on $\theta$ , the dual of $u$ :

[TABLE]

By Proposition 2, $u\rightarrow 0$ for $(\beta_{1},\beta_{2})$ sufficiently far away from the origin, which corresponds to $\theta\rightarrow-\infty$ in the dual space. From Table 1, $K^{\prime}(\theta)\rightarrow 0$ as $\theta\rightarrow-\infty$ , we have

[TABLE]

The universal asymptotics of $\theta\asymp 2\beta_{1}$ is verified.

We claim that $u$ on the other hand depends on the specific distribution $\mu$ . We will derive the asymptotics of $u$ in two special cases, Bernoulli $(.5)$ and Uniform $(0,1)$ . In both cases, $u=K^{\prime}(\theta)$ by Legendre duality. For Bernoulli $(.5)$ ,

[TABLE]

While for Uniform $(0,1)$ ,

[TABLE]

Theorem 5.2

Consider a non-degenerate probability measure $\mu$ supported on $[0,1]$ and symmetric about the line $u=1/2$ . Assume the associated Cramér rate function (6) is bounded on $[0,1]$ (i.e. $I(0)=I(1)$ is finite). Take $H_{1}$ a single edge and $H_{2}$ a finite simple graph with $p\geq 2$ edges. Let $\beta_{1}>-\beta_{2}$ and $\beta_{2}\geq 0$ . For large $n$ and $(\beta_{1},\beta_{2})$ sufficiently far away from the origin, a typical graph drawn from the model looks like an Erdős-Rényi graph $G(n,u)$ , where the edge presence probability $u$ depends on the distribution $\mu$ , but its dual $\theta$ universally satisfies $\theta\asymp 2(\beta_{1}+p\beta_{2})$ .

Proof

Let $\beta_{1}=a\beta_{2}$ with $a>-1$ . Resorting to Legendre duality, (11) gives condition (27) on $\theta$ , the dual of $u$ . By Proposition 3, $u\rightarrow 1$ for $(\beta_{1},\beta_{2})$ sufficiently far away from the origin, which corresponds to $\theta\rightarrow\infty$ in the dual space. From Table 1, $K^{\prime}(\theta)\rightarrow 1$ as $\theta\rightarrow\infty$ , we have

[TABLE]

The universal asymptotics of $\theta\asymp 2(\beta_{1}+p\beta_{2})$ is verified.

We claim that $u$ on the other hand depends on the specific distribution $\mu$ . We will derive the asymptotics of $u$ in two special cases, Bernoulli $(.5)$ and Uniform $(0,1)$ . In both cases, $u=K^{\prime}(\theta)$ by Legendre duality. For Bernoulli $(.5)$ ,

[TABLE]

While for Uniform $(0,1)$ ,

[TABLE]

See Tables 2 and 3. Even for $\beta$ with small magnitude, the asymptotic tendency of the optimal $\theta$ (hence the optimal $u$ ) is quite evident. Here we take $p=2$ . The asymptotic characterizations of $u$ obtained in Theorems 5.1 and 5.2 make possible a deeper analysis of the asymptotics of the limiting normalization constant $\psi_{\infty}^{\beta}$ of the exponential model in the following Theorems 5.3 and 5.4. Interestingly, universality is observed only in the nearly complete region ( $\beta_{1}>-\beta_{2}$ and $\beta_{2}\geq 0$ ) of the parameter space as proven in Theorem 5.4, but not the sparse region ( $\beta_{1}<-\beta_{2}$ and $\beta_{2}\geq 0$ ) as shown in Theorem 5.3.

Before stating the theorems and their proofs, we offer a possible explanation for this discrepancy. By Theorem 4.1,

[TABLE]

where $u$ is chosen so that the above equation is maximized. In statistical physics, $\beta_{1}u+\beta_{2}u^{p}$ is commonly referred to as the energy contribution and $-I(u)/2$ as the entropy contribution, with the latter being largely dependent on the specific edge-weight distribution $\mu$ . In the sparse region of the parameter space, the entropy contribution is at least as important as the energy contribution and, for many common distributions such as Bernoulli $(.5)$ and Uniform $(0,1)$ actually dominates the energy contribution. Conversely, in the nearly complete region of the parameter space, the energy contribution dominates the entropy contribution. This leads to universality of $\psi_{\infty}^{\beta}$ in the nearly complete region but not the sparse region.

Theorem 5.3

Consider a non-degenerate probability measure $\mu$ supported on $[0,1]$ and symmetric about the line $u=1/2$ . Take $H_{1}$ a single edge and $H_{2}$ a finite simple graph with $p\geq 2$ edges. Let $\beta_{1}<-\beta_{2}$ and $\beta_{2}\geq 0$ . For $(\beta_{1},\beta_{2})$ sufficiently far away from the origin, the limiting normalization constant $\psi_{\infty}^{\beta}$ depends on the distribution $\mu$ .

Proof

Let $\beta_{1}=a\beta_{2}$ with $a<-1$ . Theorem 4.1 gives (34), where $u$ is chosen so that the equation is maximized and $u\rightarrow 0$ for $(\beta_{1},\beta_{2})$ sufficiently far away from the origin. Resorting to Legendre duality, this gives

[TABLE]

where $\theta$ is the dual of $u$ and approaches $-\infty$ when $(\beta_{1},\beta_{2})$ diverge. By (27),

[TABLE]

Since $\beta_{2}\asymp\theta/(2a)$ as $\theta\rightarrow-\infty$ from Theorem 5.1, asymptotically we have

[TABLE]

Remark 2

Many common distributions including Bernoulli $(.5)$ and Uniform $(0,1)$ satisfy $\theta K^{\prime}(\theta)/K(\theta)\rightarrow 0$ as $\theta\rightarrow-\infty$ , in which case the asymptotics in Theorem 5.3 may be further reduced to $\psi_{\infty}^{\beta}\asymp K(\theta)/2\asymp K(2\beta_{1})/2$ .

Theorem 5.4

Consider a non-degenerate probability measure $\mu$ supported on $[0,1]$ and symmetric about the line $u=1/2$ . Assume the associated Cramér rate function (6) is bounded on $[0,1]$ (i.e. $I(0)=I(1)$ is finite). Take $H_{1}$ a single edge and $H_{2}$ a finite simple graph with $p\geq 2$ edges. Let $\beta_{1}>-\beta_{2}$ and $\beta_{2}\geq 0$ . For $(\beta_{1},\beta_{2})$ sufficiently far away from the origin, the limiting normalization constant $\psi_{\infty}^{\beta}$ universally satisfies $\psi_{\infty}^{\beta}\asymp\beta_{1}+\beta_{2}$ .

Proof

Let $\beta_{1}=a\beta_{2}$ with $a>-1$ . Similarly as in the proof of Theorem 5.3, Theorem 4.1 gives (34), where $u$ is chosen so that the equation is maximized and $u\rightarrow 1$ for $(\beta_{1},\beta_{2})$ sufficiently far away from the origin. Since the first two terms diverge to $\beta_{1}+\beta_{2}$ while the last term is bounded by our assumption, the claim easily follows.

Remark 3

The boundedness assumption on $I$ in Theorem 5.4 is only used as a sufficient condition to ensure that $u\rightarrow 1$ for $\beta_{1}>-\beta_{2}$ in the upper half-plane and far away from the origin and is not necessary for the derivation of the universal asymptotics for $\psi_{\infty}^{\beta}$ . Indeed, since $\theta\asymp 2(\beta_{1}+p\beta_{2})$ by Theorem 5.2, using $K(\theta)/\theta\asymp K^{\prime}(\theta)\asymp 1$ in (36), we have

[TABLE]

This universal asymptotic phenomenon is observed for example in Uniform $(0,1)$ , whose associated Cramér rate function $I$ is not bounded.

In the nearly complete region of the parameter space ( $\beta_{1}>-\beta_{2}$ and $\beta_{2}\geq 0$ ) examined in Theorems 5.2 and 5.4, the “asymptotically equivalent” Erdős-Rényi parameter $u$ depends on the edge-weight distribution $\mu$ yet the limiting normalization constant $\psi_{\infty}^{\beta}$ for the exponential random graph displays universal asymptotic behavior. Since the Erdős-Rényi model is not an exact statistical physics analog for the exponential model, this seemingly controversial discrepancy does not come as a surprise. We work out the details for standard $2$ -parameter families with Bernoulli $(.5)$ edge-weight distribution below, and the calculation may be extended to $k$ -parameter families with general edge-weight distributions.

Suppose the exponential random graph $G_{n}$ is indistinguishable in the large $n$ limit from an Erdős-Rényi random graph $G(n,u)$ in the graphon sense. In other words, for large $n$ , the $2$ -parameter exponential random graph $G_{n}$ is “equivalent” to a simplified $1$ -parameter Erdős-Rényi random graph with probability distribution

[TABLE]

where $u$ and $\beta^{\prime}$ are related by $u=e^{2\beta^{\prime}}/(1+e^{2\beta^{\prime}})$ . The limiting normalization constant $\psi_{\infty}^{\beta^{\prime}}$ for the Erdős-Rényi model is given by

[TABLE]

and the limiting normalization constant $\psi_{\infty}^{\beta}$ for the exponential random graph model is given by

[TABLE]

Utilizing the fact that $u$ satisfies

[TABLE]

we have

[TABLE]

This shows that $\psi_{\infty}^{\beta}$ (for the exponential random graph model) and $\psi_{\infty}^{\beta^{\prime}}$ (for the corresponding Erdős-Rényi model) do not coincide unless $\beta_{2}=0$ . The difference is particularly noticeable in the nearly complete region, where $\beta^{\prime}\asymp\beta_{1}+p\beta_{2}$ when $\mu$ is Bernoulli $(.5)$ , and so $\psi_{\infty}^{\beta^{\prime}}\asymp\beta_{1}+p\beta_{2}$ but $\psi_{\infty}^{\beta}\asymp\beta_{1}+\beta_{2}$ .

Acknowledgements

The authors are very grateful to the anonymous referees for the invaluable suggestions that greatly improved the quality of this paper.

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Aldous, D.: Representations for partially exchangeable arrays of random variables. J. Multivariate Anal. 11, 581-598 (1981)
2(2) Aldous, D., Lyons, R.: Processes on unimodular random networks. Electron. J. Probab. 12, 1454-1508 (2007)
3(3) Aldous, D., Steele, J.M.: The objective method: Probabilistic combinatorial optimization and local weak convergence. In: Kesten, H. (ed.) Probability on Discrete Structures, pp. 1-72. Springer, Berlin (2004)
4(4) Aristoff, D., Zhu, L.: Asymptotic structure and singularities in constrained directed graphs. Stochastic Process. Appl. 125, 4154-4177 (2015)
5(5) Benjamini, I., Schramm, O.: Recurrence of distributional limits of finite planar graphs. Electron. J. Probab. 6, 1-13 (2001)
6(6) Besag, J.: Statistical analysis of non-lattice data. J. R. Stat. Soc. Ser. D. Stat. 24, 179-195 (1975)
7(7) Bhamidi, S., Bresler G., Sly A.: Mixing time of exponential random graphs. Ann. Appl. Probab. 21, 2146-2170 (2011)
8(8) Borgs, C., Chayes, J., Cohn, H., Zhao, Y.: An L p superscript 𝐿 𝑝 L^{p} theory of sparse graph convergence I. Limits, sparse random graph models, and power law distributions. ar Xiv: 1401.2906 (2014)