Varying-coefficient models for dynamic networks

Jihui Lee; Gen Li; James D. Wilson

arXiv:1702.03632·stat.ME·April 10, 2017·Comput. Stat. Data Anal.

Varying-coefficient models for dynamic networks

Jihui Lee, Gen Li, James D. Wilson

PDF

1 Repo

TL;DR

This paper introduces VCERGM, a flexible and interpretable statistical model for dynamic networks that captures temporal and structural dependencies, enabling efficient inference and hypothesis testing.

Contribution

The paper develops VCERGM, a novel varying-coefficient exponential random graph model for dynamic networks, with maximum pseudo-likelihood fitting and bootstrap hypothesis testing.

Findings

01

Successfully applied to US Congress co-voting network

02

Analyzed resting-state brain connectivity data

03

Demonstrated advantages over existing methods

Abstract

Dynamic networks are commonly used in applications where relational data is observed over time. Statistical models for such data should capture not only the temporal dependencies between networks observed in time, but also the structural dependencies among the nodes and edges in each network. As a consequence, effectively making inference on dynamic networks is a computationally challenging task, and many models established for dynamic networks are intractable even for moderately sized networks. In this paper, we propose and investigate a family of dynamic network models, known as varying-coefficient exponential random graph models (VCERGMs), that characterize the evolution of network topology through smoothly varying parameters in an exponential family of distributions. The VCERGM provides an interpretable dynamic network model that enables the inference of temporal heterogeneity in a…

Figures30

Click any figure to enlarge with its caption.

Tables6

Table 1. Table 1: Examples of network statistics that can be specified in the VCERGM

Type	Network Statistic	Definition
Directed	Edge density	$\sum_{i \neq j} x_{i j}^{t} / {n_{t} (n_{t} - 1)}$
	Reciprocity	$\sum_{i < j} x_{i j}^{t} x_{j i}^{t} / (\frac{n_{t}}{2})$
	Cyclic triad	$\sum_{i < j < k} x_{i j}^{t} x_{j k}^{t} x_{k i}^{t} / {(\frac{n_{t}}{3}) \times 2}$
Undirected	Two-star	$\sum_{i < j < k} x_{i j}^{t} x_{j k}^{t} / {(\frac{n_{t}}{3}) \times 3}$
	Triangle	$\sum_{i < j < k} x_{i j}^{t} x_{j k}^{t} x_{k i}^{t} / (\frac{n_{t}}{3})$

Table 2. Table 2: Simulation results : Proportion of cases that we reject the null hypothesis out of 100 simulations at the significance level of α = 0.05 𝛼 0.05 \alpha=0.05 . Bootstrap samples of size B = 1000 𝐵 1000 B=1000 and chi-squared asymptotic distribution is used to make a decision for hypothesis testing.

$M$	Bootstrap					Chi-squared asymptotic
$M$	$K = 10$	30	50	70	100	$K = 10$	30	50	70	100
0	0.03	0.01	0.01	0.03	0	0	0	0.01	0.03	0
0.05	0.15	0.36	0.46	0.59	0.71	0.01	0.23	0.36	0.52	0.66
0.1	0.42	0.77	0.91	0.93	0.97	0.1	0.71	0.9	0.9	0.98
0.15	0.74	0.98	1	1	0.99	0.35	0.97	1	1	1
0.2	0.98	1	1	1	1	0.71	1	1	1	1
0.25	1	1	1	1	1	0.91	1	1	1	1
0.3	1	1	1	1	1	0.99	1	1	1	1

Table 3. Table 3: Simulation results with 30 nodes and (0, 1, 5, 10) missing networks : Mean and standard deviation of the integrated absolute errors (IAE) for each method.

	Missing	Edges			Reciprocity
	Missing	ERGM	ERGM2	VCERGM	ERGM	ERGM2	VCERGM
Sinusoidal	0	12.92 (7.52)	7.55 (8.26)	5.58 (7.48)	14.35 (4.21)	8.01 (4.9)	6.3 (1.91)
	1		7.84 (8.26)	6.42 (7.32)		8.34 (4.86)	6.25 (1.91)
	5		8.44 (8.29)	6.13 (7.46)		9.08 (4.79)	7.21 (1.75)
	10		12.15 (7.68)	11.89 (6.75)		11.37 (3.92)	9.47 (1.57)
Quadratic	0	6.16 (3.93)	2.75 (4.43)	2.72 (4.41)	8.16 (0.88)	2.74 (1.47)	3.11 (1.18)
	1		2.77 (4.42)	2.75 (4.4)		2.75 (1.49)	3.12 (1.18)
	5		2.89 (4.44)	2.92 (4.41)		2.83 (1.44)	3.25 (1.13)
	10		3.7 (4.32)	3.64 (4.29)		3.44 (1.55)	3.68 (1.19)
Erdős-Rényi	0	15.46 (9.83)	6.74 (11.32)	6.13 (11.34)	15.45 (2.24)	5.14 (2.43)	4.58 (1.43)
	1		6.74 (11.34)	6.18 (11.35)		5.12 (2.51)	4.62 (1.47)
	5		6.8 (11.29)	6.28 (11.28)		5.27 (2.5)	4.76 (1.53)
	10		6.98 (11.33)	6.46 (11.27)		5.73 (2.98)	5 (1.64)
Non-smooth	0	26.42 (3.37)	25.53 (3.47)	33.35 (1.92)	20.95 (4.44)	19.65 (4.17)	23.28 (2.95)
	1		26.21 (3.44)	33.56 (1.89)		19.86 (4.17)	23.37 (3.01)
	5		26.71 (3.43)	32.82 (1.93)		20.16 (4.2)	23.56 (2.94)
	10		34.79 (2.92)	36.01 (1.76)		22.17 (3.9)	23.38 (2.88)

Table 4. Table 4: Computing Time : Summary (Mean(SD)) of computing time (second) for dynamic networks with different number of time points K 𝐾 K

	Number of time points $K$
	20	40	60	80	100
ERGM	1.00 (0.05)	2.09 (0.08)	3.25 (0.13)	4.25 (0.11)	5.30 (0.12)
VCERGM	0.43 (0.03)	0.82 (0.04)	1.25 (0.07)	1.63 (0.07)	2.01 (0.06)

Table 5. Table 5: Simulation results with 50 nodes and (0, 1, 5, 10) missing networks : Mean and standard deviation of the integrated absolute errors (IAE) for each method.

	Missing	Edges			Reciprocity
	Missing	ERGM	ERGM2	VCERGM	ERGM	ERGM2	VCERGM
Sinusoidal	0	10.56 (16.1)	7.74 (16.88)	7.61 (16.92)	8.23 (3.22)	4.87 (4)	4.78 (3.96)
	1		7.91 (16.81)	8.28 (16.71)		5.32 (3.9)	5.01 (3.9)
	5		9 (16.55)	8.63 (16.65)		6.36 (3.66)	6.3 (3.54)
	10		13.17 (15.62)	13.9 (15.4)		9.26 (3.05)	8.92 (2.95)
Quadratic	0	6.02 (11.37)	4.14 (11.82)	4.17 (11.8)	5.18 (1.51)	2.14 (1.96)	2.42 (1.84)
	1		4.17 (11.79)	4.19 (11.78)		2.17 (1.97)	2.44 (1.86)
	5		4.39 (11.72)	4.39 (11.71)		2.29 (1.95)	2.58 (1.83)
	10		5.09 (11.54)	5.12 (11.54)		2.77 (1.93)	2.93 (1.84)
Erdős-Rényi	0	14.7 (22.94)	10.03 (24.32)	10.3 (24.25)	8.88 (1.55)	2.77 (1.21)	3.25 (0.89)
	1		10 (24.33)	10.33 (24.24)		2.79 (1.21)	3.3 (0.88)
	5		10.12 (24.33)	10.46 (24.24)		2.8 (1.26)	3.34 (0.91)
	10		10.01 (24.34)	10.43 (24.22)		2.97 (1.44)	3.4 (0.85)
Non-smooth	0	25.88 (7.26)	25.35 (7.34)	32.74 (6.16)	19.87 (8.52)	19.48 (8.61)	23.25 (7.21)
	1		26.34 (7.17)	33.1 (6.07)		19.8 (8.58)	23.47 (7.22)
	5		26.46 (7.13)	32.25 (6.2)		19.77 (8.68)	23.61 (7.29)
	10		34.87 (5.9)	36.28 (5.59)		22.64 (7.81)	23.98 (7.18)

Table 6. Table 6: Simulation results with 100 nodes and (0, 1, 5, 10) missing networks : Mean and standard deviation of the integrated absolute errors (IAE) for each method.

	Missing	Edges			Reciprocity
	Missing	ERGM	ERGM2	VCERGM	ERGM	ERGM2	VCERGM
Sinusoidal	0	25.96 (35.76)	25.54 (36.02)	25.51 (36.04)	8.88 (6.62)	8.19 (7.03)	8.14 (7.01)
	1		25.38 (36.09)	25.44 (36.05)		8.45 (6.82)	8.39 (6.85)
	5		26.3 (35.63)	26.28 (35.63)		8.86 (6.61)	8.78 (6.5)
	10		29.13 (34.25)	29.29 (34.16)		11.69 (4.91)	11.61 (4.85)
Quadratic	0	15.45 (27.98)	14.96 (28.23)	15.04 (28.18)	4.29 (3.36)	3.13 (3.78)	3.24 (3.63)
	1		14.87 (28.24)	14.94 (28.21)		3.23 (3.71)	3.32 (3.58)
	5		15 (28.13)	15.03 (28.11)		3.34 (3.71)	3.41 (3.6)
	10		15.63 (27.81)	15.63 (27.81)		3.77 (3.66)	3.83 (3.55)
Erdős-Rényi	0	37.23 (47.52)	36.48 (48.06)	36.61 (47.97)	3.6 (0.92)	1.02 (0.5)	1.48 (0.39)
	1		36.46 (48.07)	36.59 (47.99)		0.99 (0.46)	1.5 (0.4)
	5		36.67 (47.94)	36.78 (47.87)		1 (0.47)	1.5 (0.39)
	10		36.42 (48.02)	36.58 (47.91)		1.27 (0.56)	1.67 (0.45)
Non-smooth	0	32.48 (19.59)	32.64 (19.57)	37.05 (18.12)	27.37 (15.95)	27.51 (15.8)	28 (14.77)
	1		33.48 (19.26)	37.57 (17.9)		27.78 (15.74)	28.31 (14.77)
	5		33.25 (19.35)	36.69 (18.2)		27.61 (15.77)	28.37 (14.74)
	10		40.36 (17.2)	41.84 (16.62)		29.54 (14.52)	29.37 (14.18)

Equations68

P (X = x ∣ ϕ) = \frac{exp { ϕ ^{T} h ( x )}}{z \in { 0 , 1 } ^{n \times n} \sum exp { ϕ ^{T} h ( z )}},

P (X = x ∣ ϕ) = \frac{exp { ϕ ^{T} h ( x )}}{z \in { 0 , 1 } ^{n \times n} \sum exp { ϕ ^{T} h ( z )}},

P (X_{t} = x_{t} ∣ X_{t}^{-} = x_{t}^{-}) = P (X_{t} = x_{t} ∣ X_{t - 1} = x_{t - 1})

P (X_{t} = x_{t} ∣ X_{t}^{-} = x_{t}^{-}) = P (X_{t} = x_{t} ∣ X_{t - 1} = x_{t - 1})

P (X_{t} = x_{t} ∣ X_{t}^{-}; ϕ) = P (X_{t} = x_{t} ∣ X_{t - 1} = x_{t - 1}; ϕ) = \frac{exp { ϕ ^{T} g ( x _{t} , x _{t - 1} )}}{z \in { 0 , 1 } ^{n \times n} \sum exp { ϕ ^{T} g ( z , x _{t - 1} )}},

P (X_{t} = x_{t} ∣ X_{t}^{-}; ϕ) = P (X_{t} = x_{t} ∣ X_{t - 1} = x_{t - 1}; ϕ) = \frac{exp { ϕ ^{T} g ( x _{t} , x _{t - 1} )}}{z \in { 0 , 1 } ^{n \times n} \sum exp { ϕ ^{T} g ( z , x _{t - 1} )}},

P (X = x ∣ ϕ) = P (X_{1} = x_{1} ∣ ϕ) t = 2 \prod T P (X_{t} = x_{t} ∣ X_{t - 1} = x_{t - 1}, ϕ) .

P (X = x ∣ ϕ) = P (X_{1} = x_{1} ∣ ϕ) t = 2 \prod T P (X_{t} = x_{t} ∣ X_{t - 1} = x_{t - 1}, ϕ) .

P (X_{t} = x_{t} ∣ ϕ (t)) = \frac{exp { ϕ ( t ) ^{T} h ( x _{t} , n _{t} )}}{\sum _{z \in {0, 1}^{n_{t} \times n_{t}}} exp { ϕ ( t ) ^{T} h ( z , n _{t} )}}, x_{t} \in {0, 1}^{n_{t} \times n_{t}} .

P (X_{t} = x_{t} ∣ ϕ (t)) = \frac{exp { ϕ ( t ) ^{T} h ( x _{t} , n _{t} )}}{\sum _{z \in {0, 1}^{n_{t} \times n_{t}}} exp { ϕ ( t ) ^{T} h ( z , n _{t} )}}, x_{t} \in {0, 1}^{n_{t} \times n_{t}} .

ϕ_{k} (t) = ℓ = 1 \sum q Φ_{k ℓ} B_{ℓ} (t),

ϕ_{k} (t) = ℓ = 1 \sum q Φ_{k ℓ} B_{ℓ} (t),

ϕ (t) = Φ B (t) .

ϕ (t) = Φ B (t) .

PL (Φ ∣ x_{s}) = i, j \in [n] \prod P (X_{ij}^{s} = x_{ij}^{s} ∣ X_{- (ij)}^{s} = x_{- (ij)}^{s}) .

PL (Φ ∣ x_{s}) = i, j \in [n] \prod P (X_{ij}^{s} = x_{ij}^{s} ∣ X_{- (ij)}^{s} = x_{- (ij)}^{s}) .

PL (Φ ∣ x) = s = t_{1} \prod t_{K} i, j \in [n] \prod P (X_{ij}^{s} = x_{ij}^{s} ∣ X_{- (ij)}^{s} = x_{- (ij)}^{s}) .

PL (Φ ∣ x) = s = t_{1} \prod t_{K} i, j \in [n] \prod P (X_{ij}^{s} = x_{ij}^{s} ∣ X_{- (ij)}^{s} = x_{- (ij)}^{s}) .

logit {P (X_{ij}^{s} = 1∣ X_{- (ij)}^{s} = x_{- (ij)}^{s})}

logit {P (X_{ij}^{s} = 1∣ X_{- (ij)}^{s} = x_{- (ij)}^{s})}

= lo g [exp {ϕ (t)^{T} (h (x_{s, ij}^{+}) - h (x_{s, ij}^{-}))}]

= ϕ_{s}^{T} Δ_{ij}^{s}

Y_{s} = Δ_{s}^{T} Φ B_{s} = (B_{s} \otimes Δ_{s})^{T} vec (Φ), s = t_{1}, \dots, t_{K} .

Y_{s} = Δ_{s}^{T} Φ B_{s} = (B_{s} \otimes Δ_{s})^{T} vec (Φ), s = t_{1}, \dots, t_{K} .

H = B_{t_{1}} \otimes Δ_{t_{1}} ⋮ B_{t_{K}} \otimes Δ_{t_{K}} .

H = B_{t_{1}} \otimes Δ_{t_{1}} ⋮ B_{t_{K}} \otimes Δ_{t_{K}} .

P (Φ_{(k)}) = \int {D^{2} ϕ_{k} (u)}^{2} d u = Φ_{(k)}^{T} Ω Φ_{(k)}

P (Φ_{(k)}) = \int {D^{2} ϕ_{k} (u)}^{2} d u = Φ_{(k)}^{T} Ω Φ_{(k)}

\bm{\Omega}=\Bigl{\{}\Omega_{ij}=\int\{D^{2}B_{i}(u)\}\{D^{2}B_{j}(u)\}\,du\,;\,\,i,\,j=1,\ldots,q\Bigr{\}}.

\bm{\Omega}=\Bigl{\{}\Omega_{ij}=\int\{D^{2}B_{i}(u)\}\{D^{2}B_{j}(u)\}\,du\,;\,\,i,\,j=1,\ldots,q\Bigr{\}}.

Ω_{ij} = s = t_{1} \sum t_{K} {D^{2} B_{i} (s)} {D^{2} B_{j} (s)}, i, j = 1, \dots, q .

Ω_{ij} = s = t_{1} \sum t_{K} {D^{2} B_{i} (s)} {D^{2} B_{j} (s)}, i, j = 1, \dots, q .

P_{Ω} (Φ) = k = 1 \sum p Φ_{(k)}^{T} Ω Φ_{(k)} = vec (Φ)^{T} (Ω \otimes I_{p}) vec (Φ)

P_{Ω} (Φ) = k = 1 \sum p Φ_{(k)}^{T} Ω Φ_{(k)} = vec (Φ)^{T} (Ω \otimes I_{p}) vec (Φ)

y^{T} H vec (Φ) - lo g [1 + exp {H vec (Φ)}] - λ P_{Ω} (Φ) .

y^{T} H vec (Φ) - lo g [1 + exp {H vec (Φ)}] - λ P_{Ω} (Φ) .

H_{0} : ϕ_{1} (t) = ϕ_{1}^{0}, \dots, ϕ_{p} (t) = ϕ_{p}^{0} .

H_{0} : ϕ_{1} (t) = ϕ_{1}^{0}, \dots, ϕ_{p} (t) = ϕ_{p}^{0} .

H_{0} : Φ = Φ^{0} = (ϕ_{1}^{0}, \dots, ϕ_{p}^{0})^{^{T}} \times 1_{q}^{^{T}},

H_{0} : Φ = Φ^{0} = (ϕ_{1}^{0}, \dots, ϕ_{p}^{0})^{^{T}} \times 1_{q}^{^{T}},

lo g PL (Φ ∣ x)

lo g PL (Φ ∣ x)

T

T

T^{* (b)} = 2 {lo g PL (\hat{Φ}_{H1}^{* (b)} ∣ x^{* (b)}) - lo g PL (\hat{Φ}_{H0}^{* (b)} ∣ x^{* (b)})}, b = 1, \dots, B .

T^{* (b)} = 2 {lo g PL (\hat{Φ}_{H1}^{* (b)} ∣ x^{* (b)}) - lo g PL (\hat{Φ}_{H0}^{* (b)} ∣ x^{* (b)})}, b = 1, \dots, B .

p -value = \frac{\sum _{b = 1}^{B} I ( T < T ^{* (b)} )}{B} .

p -value = \frac{\sum _{b = 1}^{B} I ( T < T ^{* (b)} )}{B} .

\phi(t)=M\sin\Bigl{(}\frac{2\pi t}{T}\Bigr{)},\quad t\in[0,T].

\phi(t)=M\sin\Bigl{(}\frac{2\pi t}{T}\Bigr{)},\quad t\in[0,T].

I A E (ϕ (t), \hat{ϕ} (t)) = s = t_{1} \sum t_{K} ∣ ϕ (s) - \hat{ϕ} (s) ∣.

I A E (ϕ (t), \hat{ϕ} (t)) = s = t_{1} \sum t_{K} ∣ ϕ (s) - \hat{ϕ} (s) ∣.

P (X_{t} = x_{t} ∣ ϕ_{1} (t), x_{t - 1})

P (X_{t} = x_{t} ∣ ϕ_{1} (t), x_{t - 1})

g (x_{t}, x_{t - 1}) = h (x_{t}) - h (x_{t - 1}), t = 2, \dots, T,

g (x_{t}, x_{t - 1}) = h (x_{t}) - h (x_{t - 1}), t = 2, \dots, T,

X_{t} ∣ X_{t}^{-} \sim P (X_{t} = x_{t} ∣ X_{t - 1} = x_{t - 1}; ϕ) = \frac{exp { ϕ ^{T} g ( x _{t} , x _{t - 1} )}}{z \in { 0 , 1 } ^{n \times n} \sum exp { ϕ ^{T} g ( z , x _{t - 1} )}} .

X_{t} ∣ X_{t}^{-} \sim P (X_{t} = x_{t} ∣ X_{t - 1} = x_{t - 1}; ϕ) = \frac{exp { ϕ ^{T} g ( x _{t} , x _{t - 1} )}}{z \in { 0 , 1 } ^{n \times n} \sum exp { ϕ ^{T} g ( z , x _{t - 1} )}} .

X_{t} ∣ X_{t}^{-} \sim P (X_{t} = x ∣ ϕ) = \frac{exp { ϕ ^{T} h ( x )}}{z \in { 0 , 1 } ^{n \times n} \sum exp { ϕ ^{T} h ( z )}}

X_{t} ∣ X_{t}^{-} \sim P (X_{t} = x ∣ ϕ) = \frac{exp { ϕ ^{T} h ( x )}}{z \in { 0 , 1 } ^{n \times n} \sum exp { ϕ ^{T} h ( z )}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jihuilee/VCERGM
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Varying-Coefficient Models for Dynamic Networks

Jihui Lee

Department of Biostatistics, Columbia University

Gen Li

Department of Biostatistics, Columbia University

and

James D. Wilson

Department of Mathematics and Statistics, University of San Francisco

The authors gratefully acknowledge that Gen Li’s research was partially supported by the Calderone Junior Faculty Award from the Mailman School of Public Health at Columbia University.

Abstract

Dynamic networks are commonly used in applications where relational data is observed over time. Statistical models for such data should capture not only the temporal dependencies between networks observed in time, but also the structural dependencies among the nodes and edges in each network. As a consequence, effectively making inference on dynamic networks is a computationally challenging task, and many models established for dynamic networks are intractable even for moderately sized networks. In this paper, we propose and investigate a family of dynamic network models, known as varying-coefficient exponential random graph models (VCERGMs), that characterize the evolution of network topology through smoothly varying parameters in an exponential family of distributions. The VCERGM provides an interpretable dynamic network model that enables the inference of temporal heterogeneity in a dynamic network. We establish how to fit the VCERGM through maximum pseudo-likelihood techniques, and thus provide a computationally tractable method for statistical inference of complex dynamic networks. We furthermore devise a bootstrap hypothesis testing framework for testing the temporal heterogeneity of an observed dynamic network sequence. We apply the VCERGM to the US Congress co-voting network and a resting-state brain connectivity case study and show that our method provides relevant and interpretable patterns describing each data set. Comprehensive simulation studies demonstrate the advantages of our proposed method over existing methods.

Keywords: Exponential random graph model, Temporal graphs, Basis spline, Pseudo likelihood, Penalized logistic regression

1 Introduction

Networks have been extensively used to explore, model, and analyze the relational structure of individual units, or actors, in a complex system. In a network model, nodes represent the actors of the system, and edges are placed between nodes if the corresponding actors share a relationship. In many applications, the relationships among the actors of a modeled system change over time, necessitating the use of dynamic networks. Two diverse examples, which we analyze later in our application study, include the Congressional co-voting networks in Figure 1 and resting state brain connectivity networks in Figure 2. A prominent way to analyze relational network systems is through the use of probabilistic models, or graphical models, which describe the generative mechanism of an observed network. Although there is a rich body of literature on graphical models for static networks [15, 16], the development of interpretable and computationally tractable models for dynamic networks is in its early stages.

An important feature of dynamic networks that needs to be captured in any statistical model is the extent to which its local and global features change through time. We refer to this property as temporal heterogeneity. Heterogeneity directly affects the underlying process that best describes the formation of network. In parametric models, heterogeneity may result in significant changes in parameters that characterize the observed network. Consider the U.S. Senate co-voting network shown in Figure 1. One can readily observe an evolution of the network to form distinct clusters of Republicans and Democrats by the 113th Congress. Moreover, this configuration is in stark contrast with the sparse, seemingly random configuration formed in the 40th Congress. On the other hand, the resting state functional magnetic resonance imaging (fMRI) network shown in Figure 2 remains fairly stable through time with only minor local changes in edge formation. These contrasting examples exemplify the need to explicitly model the heterogeneity of a network. We further analyze these dynamic networks in Sections 6.1 and 6.2.

The examples above highlight the importance of accounting for evolving structure when modeling a dynamic network. In this paper, we propose a probabilistic model for dynamic networks called the varying-coefficient exponential random graph model (VCERGM). The model parameterizes time-varying topological features of dynamic networks in continuous time. Our model builds on two major statistical methodologies. One is the exponential family of random graph models [24, 48] that characterizes the marginal effect of local and global network features on the likelihood of the network. The other major component is a varying-coefficient specification [21], which flexibly models the changes of effect parameters over time. The VCERGM characterizes the temporal heterogeneity of dynamic network by modeling the parameter associated with each topological feature as a smooth function of time.

By quantifying temporal heterogeneity of a network via fluctuating parameters, we are able to analyze key properties of the local and global features of a dynamic network through the VCERGM. In addition to serving as a means to test for heterogeneity of a dynamic sequence, our model can also be directly used for interpolation of missing networks or edges. For networks at unobserved time points, the VCERGM provides robust estimates that reflect the structure of the unobserved networks without being strongly influenced by outliers in the sequence. Importantly, estimation of the VCERGM can be done with maximum pseudo-likelihood estimation (MPLE), enabling computationally efficient inference for dynamic networks with many nodes or time points.

Several dynamic network models that have been investigated. We briefly describe these here. The exponential random graph model (ERGM) is a family of probability distributions on unweighted static network. The ERGM has been adapted to dynamic networks in the pivotal work of [20]. The method is called the temporal exponential random graph model (TERGM). The TERGM models the difference in topological features between every two consecutive networks in a similar fashion to the ERGM. However, it ignores the heterogeneity of the differences, and cannot fully capture the time-varying patterns of the network structure. In fact, we show that in a wide range of situations the TERGM degenerates to a collection of independent and identically distributed ERGMs (see supplementary materials).

The TERGM has been further investigated in many different perspectives. [19] devised the hidden TERGM (HTERGM), which utilizes a hidden Markov process to express the nature of rewiring networks and model a time-specific network topology. [32] generalized the TERGM to the separable TERGM (STERGM). The STERGM flexibly models the formation and dissolution of networks by separately parameterizing prevalence and duration of fluctuations.

Another method for dynamic network modeling is the stochastic actor-oriented model (SAOM) [41]. It provides an alternative to dyadic models and instead is a localized actor-based model, which characterizes network evolution as a consequence of each actors’ connectivity. Even if the SAOM considers the fluctuation between two time points, it does not provide explicit form to parametrize the fluctuation in network topology. [37] and [38] generalized the latent space model developed by [22] to dynamic networks. The dynamics of network structure is modeled through random effects in a latent space. It focuses on the transition between two time points and provides limited description on overall network.

Compared to time-invariant models described above, an alternative actor-based model was introduced in [23], where dynamic networks are modeled using multilinear tensor regression. This work adapted autoregressive models to dictate temporal dependence in a sequence of networks, and like the SAOM, proposed an actor-based dependence structure between edges in each network. It directly models the temporal heterogeneity but may not be adequate for larger networks due to its computational complexity. In the meantime, [29] emphasizes on capturing time-varying attributes of dynamic networks and parametrizes the evolving relationship of each edge between nodes as a smooth function of time. Along with kernel smoothing approach, the $\ell_{1}$ -regularization is utilized to ensure the smoothness. The parameters in the model provide a valuable intuition in understanding the topological change of each edge, but fitting this model for larger networks can be computationally expensive considering the number of parameters.

The VCERGM exploits a varying-coefficient framework to model the temporal heterogeneity of topological features. In general, varying-coefficient models form a family of semi-parametric models in which the coefficient of some parametric model evolves with characteristics of the coefficient in a nonparametric manner. Varying-coefficient models were first used to characterize non-linear effects of covariates on real-valued response variables [21]. Later it was extended to the dynamic generalized linear models [25, 52]. A detailed review of varying-coefficient models and their applications are provided in [12]. In our proposed model, we model the coefficients of the topological features of a network as a function of time. The marginal parametric model takes the form of an exponential random graph model (ERGM). In this setting, the varying coefficients effectively capture the dynamic pattern of network structure by modeling the variation in the sufficient statistics of an observed sequence of networks.

The remainder of this paper is organized as follows. In Section 2 we introduce the varying-coefficient exponential random graph model for dynamic networks. Section 3 describes the pseudo-likelihood estimation of the VCERGM via penalized logistic regression. A hypothesis testing procedure for formally testing temporal fluctuation in a sequence of time-varying graphs is introduced in Section 4. We assess the performance of the VCERGM on a series of simulated dynamic networks, comparing it to alternative generative graph models in Section 5. In Section 6, we apply the VCERGM to the functional magnetic resonance imaging (fMRI) and co-voting networks and analyze the dynamic nature of each of these applications. We conclude with a discussion of our method and open areas for future research in Section 7.

2 Model

We begin by describing the exponential family of random graph models (ERGMs) and their temporal extension, the TERGM, since our proposed model is closely related to these specifications. We then describe our proposed model, the VCERGM.

2.1 Exponential Random Graph Models

Suppose that $X\in\{0,1\}^{n\times n}$ is an unweighted network with $n$ vertices, whose $(i,j)$ th entry $X_{ij}$ is an indicator that specifies whether or not node $i$ and node $j$ are connected by an edge. The ERGM is a probability distribution that characterizes the likelihood of $X$ via a function of network statistics $\mathbf{h}:\{0,1\}^{n\times n}\rightarrow\mathbb{R}^{p}$ that describe the topological structure of $X$ . One may include, for example, a statistic that quantifies the tendency of the nodes in the network to exhibit reciprocal ties using the reciprocity statistic $h_{1}(X)=\sum_{i\neq j}X_{ij}X_{ji}$ . [27] describes a means of determining which network statistics to include in a model based on collection of goodness of fit diagnostics. Given $\mathbf{h}$ , the ERGM posits that $X$ is a binary random matrix generated from the following probability mass function

[TABLE]

where $\bm{\phi}\in\mathbb{R}^{p}$ parameterizes the influence of the network statistics $\mathbf{h}(X)$ on the likelihood of $X$ . The ERGM has been successfully applied in a wide variety of fields, ranging from social networks to brain connectivity networks [18, 39, 44]. Recent tutorials of exponential random graph models and their applications are provided in [5, 14, 36].

The TERGM is an important and popular statistical model for inference of dynamic networks [9, 20], and can be described as follows. Consider a dynamic network $\mathbf{X}=\{X_{1},X_{2},\ldots,X_{T}\}$ that is observed at $T$ discrete and non-overlapping time periods, where each graph $X_{t}\in\{0,1\}^{n\times n}$ from $\mathbf{X}$ is unweighted, and observed for the set of vertices $[n]=\{1,\ldots,n\}$ . The TERGM is a generative model for $\mathbf{X}$ that characterizes the conditional probability of $X_{t}$ given $\mathbf{X}_{t}^{-}=\{X_{s}:s=1,\ldots,t-1\}$ via an exponential family of probability distributions. Under the first order TERGM, $\mathbf{X}$ exhibits a one-step Markov dependence between sequential networks as follows.

[TABLE]

Under (2), one can fully specify the joint probability mass function of $\mathbf{X}$ by parameterizing the one-step transitions from $X_{t-1}$ to $X_{t}$ . One models these dependencies using a function of transition statistics $\mathbf{g}:\{0,1\}^{n\times n}\times\{0,1\}^{n\times n}\rightarrow\mathbb{R}^{p}$ . These statistics represent the temporal potential over cliques across two sequential networks and can represent, for example, the change in the clustering or the change in overall connectivity between each pair of networks. For a chosen $\mathbf{g}$ , the first-order TERGM specifies the likelihood of $X_{t}\mid X_{t-1}$ for $t=2,\ldots,T$ as

[TABLE]

where $\bm{\phi}\in\mathbb{R}^{p}$ parameterizes the influence of the transition statistics $\mathbf{g}(X_{t},X_{t-1})$ on the conditional likelihood of $X_{t}$ given $X_{t-1}$ . Suppose that the marginal distribution $\mathbb{P}(X_{1}=x_{1}\mid\bm{\phi})$ is specified. The TERGM characterizes the joint distribution of the dynamic sequence $\mathbf{X}$ by

[TABLE]

We note that in general if one is able to specify appropriate transition statistics, then the TERGM in (3) and (4) is readily generalized to higher-order Markov dependency; however, for discussion of our equivalence statement in the supplementary materials, we consider the first-order TERGM only.

2.2 Varying-Coefficient Exponential Random Graph Models

Let $\mathbf{X}=\{X_{t}:0\leq t\leq T\}$ be a stochastic sequence of temporally ordered networks observed continuously up to some time $T>0$ . At each time point $t$ , $X_{t}\in\{0,1\}^{n_{t}\times n_{t}}$ represents an unweighted, directed or undirected network with $n_{t}$ nodes. Our goal is to provide a dynamic network model for $\mathbf{X}$ that directly accounts for the temporal heterogeneity of its local and global network structure.

The VCERGM consists of two components - (i) an ERGM representation for the marginal likelihood of each observed network, and (ii) the coupling of networks over time via a varying-coefficient model, where the coefficients at time $t$ parameterize the marginal likelihood of the network $X_{t}$ . We first specify a set of functions $\mathbf{h}(x_{t},n_{t}):\{0,1\}^{n_{t}\times n_{t}}\rightarrow\mathbb{R}^{p}$ for $t\in[0,T]$ , which quantify the same $p$ topological features of network $x_{t}$ with size $n_{t}$ . Given $\mathbf{h}(x_{t},n_{t})$ and the coefficient vector $\bm{\phi}(t)=(\phi_{1}(t),\cdots,\phi_{p}(t))^{T}\in\mathbb{R}^{p}$ , the marginal likelihood of $X_{t}$ at time $t$ has an ERGM representation given by

[TABLE]

A large collection of topological features can be used in the VCERGM. Traditionally, the network statistics are raw counts of different features in an observed network, such as the number of edges (edge density), the number of triangles, or the number of reciprocal edges in a directed network. However, in dynamic networks, networks at different time points may have differing numbers of nodes. Thus it is not appropriate to use the raw counts directly. Instead, one should standardize the network statistics to make them comparable over time. We propose to standardize the raw count of each feature by its maximal possible value, and use the standardized statistics in model (5). Table 1 provides examples of standardized network statistics for a binary graph $X_{t}$ with $n_{t}$ nodes, where $x^{t}_{ij}$ represents whether there is an edge from node $i$ to node $j$ in $X_{t}$ .

The coefficients $\bm{\phi}(t)$ in model (5) characterize the influence of the corresponding network statistics on determining the network structure. When a dynamic network evolves gradually over time, it is reasonable to believe the coefficients will also change gradually. In such a case, $\bm{\phi}(t)$ can be represented by smooth functions of $t$ with continuous second order derivatives over $[0,T]$ [35]. The temporal dependence among the graphs in $\mathbf{X}$ are captured by the dynamic pattern of the coefficient functions. This is the fundamental assumption of our model. In the special case where all the separate functions in $\bm{\phi}(t)$ are constant, the generative models underlying the dynamic networks are identical over time and the VCERGM reduces to a family of marginally identically distributed ERGMs. In Section 4, we introduce a formal hypothesis testing procedure to test the temporal heterogeneity of the coefficients.

3 Estimation

3.1 Spline-Based Representation of Time-Varying Coefficients

Without any constraint, the collection of coefficients $\{\bm{\phi}(t):0\leq t\leq T\}$ contain an infinite number of parameters, making inference on (5) intractable. To address this problem, we approximately represent these smooth functions as a linear combination of basis functions. Possible strategies of defining basis functions include piecewise polynomials [7], Fourier series [30], and wavelets [6]. For inferential purposes, we employ basis splines (b-splines) [7, 10] as a way to reduce the dimensionality of estimation. B-splines are commonly used due to its flexibility in incorporating smoothing constraints.

In particular, we first specify a collection of basis functions $B_{1}(t),\ldots,B_{q}(t)$ , $0\leq t\leq T$ , and then approximate $\phi_{k}(t)$ by a linear combination of these functions

[TABLE]

where $\Phi_{k\ell}$ quantifies the contribution of the $\ell$ th basis function on $\phi_{k}(t)$ . Let $\bm{\Phi}=(\Phi_{k\ell};\,k=1,\ldots,p,\,\ell=1,\ldots,q)$ denote the $p\times q$ basis coefficient matrix and let $\mathbf{B}(t)=(B_{1}(t),\ldots,B_{q}(t))^{T}$ be the length $q$ vector of basis functions. We can represent the coefficients $\bm{\phi}(t)$ as

[TABLE]

The set of $q$ basis functions represents the smoothness of $\bm{\phi}(t)$ , and the coefficient matrix $\bm{\Phi}$ determines the shape and trajectory of the fluctuations through time. Under the basis representation in (6), the distribution of $\mathbf{X}_{t}$ in (5) is fully specified by the $pq$ parameters in the coefficient matrix $\bm{\Phi}$ .

3.2 Fast Estimation via Maximum Pseudo Likelihood

For an observed dynamic sequence of unweighted graphs $\mathbf{x}=\{x_{s}\in\{0,1\}^{n_{s}\times n_{s}}:s=t_{1},\ldots,t_{K},t_{j}<t_{j+1}\in[0,T]\}$ , our goal is to estimate the coefficients $\{\bm{\phi}(t):0\leq t\leq T\}$ given the sequence $\mathbf{x}$ . Let $\mathbf{B}_{s}=\{B_{s,\ell};\,\ell=1,\ldots,q\}$ be a vector of length $q$ of which elements are the basis functions evaluated at time $s$ . By applying the basis representation in (6), we denote $\bm{\phi}_{s}=\bm{\Phi}\mathbf{B}_{s}$ as the smooth function $\bm{\phi}(\cdot)$ evaluated at time $s$ . Therefore, this estimation reduces to the task of estimating the $p\times q$ coefficient matrix $\bm{\Phi}$ . A major obstacle in obtaining the maximum likelihood estimators of the parameters in Model (5), similar to that of fitting an ERGM, is that calculation of the normalizing constant in the denominator is computationally intractable. Although numerical approaches such as the Markov chain Monte Carlo method can be used to estimate $\bm{\Phi}$ for small networks [28, 50], the computational cost is prohibitive for moderate to large networks, let alone a sequence of networks. To alleviate the computational complexity, we exploit a maximum pseudo-likelihood approach, originally adapted for fitting the ERGM [43, 46, 48]. We show that the maximum pseudo-likelihood estimator (MPLE) for the VCERGM can be efficiently obtained via maximum likelihood estimation of a logistic regression model. Below we describe the estimation procedure in more detail.

Without loss of generality, we assume that the numbers of nodes in different networks are the same over time, i.e., $n_{s}\equiv n$ for all $s$ . In the case where $n_{s}$ varies over time, one can simply use the normalized network statistics described in Section 2. For each observed time point $s=t_{1},\ldots,t_{K}$ , let $X^{s}_{ij}$ denote the binary random variable that describes whether or not there is an edge between node $i$ and node $j$ at time $s$ . Furthermore, let $\mathbf{X}^{s}_{-(ij)}$ collection of ${n\choose 2}-1$ binary random variables that describe whether or not there is an edge between all other pairs of nodes other than the node pair $i$ and $j$ . For each observed time point $s=t_{1},\ldots,t_{K}$ , assume the conditional independence between edges. The marginal pseudo-likelihood function of $\bm{\Phi}$ given $x_{s}$ at time $s$ is defined as

[TABLE]

Subsequently, the marginally independent composite pseudo likelihood of model (5) is

[TABLE]

The MPLE $\widehat{\bm{\Phi}}$ is obtained by maximizing $\text{PL}(\bm{\Phi}|\mathbf{x})$ .

Let $x_{s,ij}^{+}$ denote an $n\times n$ matrix whose $(i,j)$ th entry $x^{s}_{ij}=1$ and let $x_{s,ij}^{-}$ be the same $n\times n$ matrix except $x^{s}_{ij}=0$ . Define $\bm{\Delta}^{s}_{ij}=\mathbf{h}(x_{s,ij}^{+})-\mathbf{h}(x_{s,ij}^{-})$ as the vector describing the element-wise difference in the network statistics when $x^{s}_{ij}$ changes from 0 to 1. One can readily show that for each $s=t_{1},\ldots,t_{K}$ , the following relationship holds for all $i,j\in[n]$ :

[TABLE]

Let $Y_{ij}^{s}=\text{logit}\{\mathbb{P}(X^{s}_{ij}=1|\mathbf{X}^{s}_{-(ij)}=\mathbf{x}^{s}_{-(ij)})\}$ and let $\mathbf{Y}_{s}=(Y_{11}^{s},Y_{12}^{s},\ldots,Y_{nn}^{s})^{T}$ . Similarly, define $\bm{\Delta}_{s}=(\bm{\Delta}^{s}_{11},\bm{\Delta}^{s}_{12},\ldots\bm{\Delta}^{s}_{nn})$ as the $p\times\binom{n}{2}$ matrix whose $r$ th row contains the change in the $r$ th network statistic when each edge changes from 0 to 1. Let $\text{vec}(X)$ be the operator that stacks the columns of $X$ into a column vector, and let $\otimes$ represent the Kronecker product operator. Combining (6) and (3.2) yields

[TABLE]

Let $\mathbf{Y}=(\mathbf{Y}_{t_{1}},\cdots,\mathbf{Y}_{t_{K}})^{T}$ , and define the $K\binom{n}{2}\times pq$ design matrix $\mathbf{H}$ as

[TABLE]

The relationship in (9) connects to a logistic regression where $\mathbf{H}$ represents a design matrix with its coefficient $\text{vec}(\bm{\Phi})$ . In [43], it was shown that maximizing the pseudo-likelihood PL( $\bm{\Phi}|x_{s})$ in (7) is equivalent to finding the maximum likelihood estimator (MLE) of $\bm{\Phi}$ in the logistic regression model given in (3.2) with independent entries $X_{ij}^{s}$ . It follows from the independence of $\mathbf{X}_{s}$ and $\mathbf{X}_{s^{\prime}}$ for $s\neq s^{\prime}$ that maximizing PL( $\bm{\Phi}|\mathbf{x}$ ) is equivalent to calculating the MLE of $\bm{\Phi}$ in the logistic regression model $\mathbf{Y}=\mathbf{H}^{T}\text{vec}(\bm{\Phi})$ treating $\{X_{ij}^{s}:i,j\in[n],s=t_{1},\ldots,t_{K}\}$ as mutually independent variables.

3.3 Penalized Logistic Regression

To obtain smooth estimates of the time-varying coefficients $\bm{\phi}(t)$ , we further consider a roughness penalty on the coefficients of the basis functions [see 10, 25, for example]. A commonly used penalty, which we use throughout this paper, is the integrated squared second derivative defined for $k$ th row of $\bm{\Phi}$ , denoted as $\bm{\Phi}_{(k)}$ , as

[TABLE]

where a smoothness matrix $\bm{\Omega}$ in this case can be specified as

[TABLE]

For networks observed at discrete time points $t_{1},\ldots,t_{K}$ , the $(i,j)$ th element of $\bm{\Omega}$ is

[TABLE]

For more examples of possible penalties, see the Chapter 5 in Ramsay [35]. As the same collection of basis functions are used to express $\phi_{k}(t)$ , $k=1,\ldots,p$ , via basis representation, we impose the same $\bm{\Omega}$ on all $\phi_{k}(t)$ . Consequently, we add the penalty term

[TABLE]

to the logistic log pseudo likelihood function. In conclusion, we calculate the penalized pseudo-likelihood estimator $\widehat{\bm{\Phi}}_{\Omega}$ by maximizing the following penalized log likelihood with tuning parameter $\lambda$ :

[TABLE]

To fit (10), we implement an iteratively reweighted least squares (IRLS) algorithm. A detailed description of this procedure is available in supplementary materials.

4 Testing for Heterogeneity

A key assumption of the VCERGM is that the effects of a specified collection of statistics vary through time. This assumption reflects heterogeneity in an observed sequence of graphs $\mathbf{x}$ , and provides intuition as to whether or not summaries of $\mathbf{x}$ can be treated in aggregate. One can formally test for heterogeneity in $\mathbf{x}$ using a likelihood ratio test (LRT), which we will now describe.

We begin with the preconceived notion that $\mathbf{x}$ is homogeneous, namely that the coefficients $\bm{\phi}(t)$ under model (5) are fixed as constants over time. This serves as the null model, under which the VCERGM is equivalent to fitting independent and identically distributed ERGMs. With fixed constants $\phi_{1}^{0},\ldots,\phi_{p}^{0}$ , the null hypothesis corresponding to a homogeneous sequence of graphs can be written as

[TABLE]

Setting the function $\phi_{k}(t)=\phi_{k}^{0}$ is equivalent to writing $\Phi_{k\ell}=\phi^{0}_{k}$ for all $\ell=1,\ldots q$ . As a result, the null hypothesis in (11) can be expressed more succinctly as

[TABLE]

where $\mathbf{1}_{q}$ is length $q$ vector of 1’s. The coefficients under the null hypothesis are the restricted form of the VCERGM where the basis coefficients for each network statistic are constants for all $q$ basis functions.

The likelihood ratio test (LRT) is commonly used for conducting the test for heterogeneity in varying-coefficient models [4, 13, 11, 12]. Due to the dependence between entries in each graph, we utilize a pseudo likelihood ratio test (pLRT) [42]. As previously emphasized in Section 3, the joint pseudo likelihood consists of the distribution of $X_{ij}^{s}$ given the rest of the data $\mathbf{X}_{-(ij)}^{s}$ for all $i,j\in[n],s=t_{1},\ldots t_{K}$ . Furthermore, maximizing the pseudo likelihood simplifies the estimation process as fitting a logistic regression. Namely, with observed networks $\mathbf{x}=\{x_{s}:s=t_{1},\ldots,t_{K}\}$ with $n$ nodes, the pLRT compares the pseudo log likelihood function below under the null and alternative hypotheses:

[TABLE]

Let $\hat{\bm{\Phi}}_{\text{H0}}$ and $\hat{\bm{\Phi}}_{\text{H1}}$ be the estimates of $\bm{\Phi}$ under the null and alternative hypotheses. The estimate $\hat{\bm{\Phi}}_{\text{H1}}$ can be calculated by fitting the VCERGM specified in (5) and $\hat{\bm{\Phi}}_{\text{H0}}$ is the estimate from the VERCM with a restriction of constant basis coefficients. Accordingly, let $\log\text{PL}(\hat{\bm{\Phi}}_{\text{H0}}|\mathbf{x})$ and $\log\text{PL}(\hat{\bm{\Phi}}_{\text{H1}}|\mathbf{x})$ denote the pseudo log likelihood functions under the null and alternative, respectively. Then, the test statistic is

[TABLE]

We reject the null hypothesis when $T>C_{\alpha}$ where $C_{\alpha}$ is the critical value of the test with significance level $\alpha$ . In general, there exist two strategies commonly used to compute $C_{\alpha}$ . The first approach is to apply Wilks phenomenon for deriving the asymptotic null distribution of the test statistic [2, 13]. Assumptions required to validate the asymptotic chi-squared distribution of the test statistic are presented in [13]. Even though Wilks phenomenon is readily applicable to our case, calculating the degrees of freedom for asymptotic chi-squared distribution is difficult, and the chi-squared distribution may not be appropriate when the network size is relatively small. Another approach, which is preferable for moderate network size, involves generating bootstrap samples to construct the null distribution of $T$ [4, 12, 26]. Analogous to the work in [8, 33, 45], the steps of obtaining the critical value $C_{\alpha}$ or calculating the $p$ -value with parametric bootstrapping can be described as follows. For a large value of $B$ , the test statistics (12) calculated based on $B$ bootstrap samples successfully represent the null distribution of $T$ .

Create $B$ bootstrap samples. For each bootstrap, indexed by $b=1,\ldots,B$ , $\mathbf{x}^{*(b)}=\{x_{s}^{*(b)}:s=t_{1},\ldots,t_{K}\}$ is a sample from $\mathbb{P}(\mathbf{X}|\hat{\bm{\Phi}}_{\text{H0}})$ . 2. 2.

For each bootstrap sample $\mathbf{x}^{*(b)}$ , estimate $\bm{\Phi}$ under the null and alternative hypotheses and denote them as $\hat{\bm{\Phi}}_{\text{H0}}^{*(b)}$ and $\hat{\bm{\Phi}}_{\text{H1}}^{*(b)}$ , respectively. 3. 3.

Calculate the test statistic for each bootstrap sample as

[TABLE] 4. 4.

The critical value $C_{\alpha}$ is determined as the $(1-\alpha)$ th quantile of $(T^{*(1)},\ldots,T^{*(B)})$ . The $p$ -value is the proportion of times that the bootstrap test statistic values exceed the observed test statistic $T$ . Define an indicator function $I(A)$ which takes a value of 1 if $A$ is true and 0 otherwise. Then the $p$ -value can be written as

[TABLE]

The $p$ -value is then used to determine whether or not to reject the null hypothesis. For values below a specified significance value, $\alpha$ , one rejects the null hypothesis in (11) and decides that the sequence of networks does exhibit heterogeneity in its parameters. In our applications below, we choose $\alpha=0.05$ when evaluating any hypothesis test.

5 Simulation Study

The goal of our simulation study is two-fold: (i) to evaluate the power of the hypothesis testing procedure described in Section 4, and (ii) to assess the goodness of fit of the VCERGM on dynamic networks with various magnitudes of temporal heterogeneity. In Section 5.1, we evaluate the sensitivity of the hypothesis test in (11) for detecting temporal heterogeneity in a sequence of networks with fluctuating parameters using both the bootstrap procedure and the chi-squared approximation. Section 5.2 assesses the performance of the VCERGM under various varying-coefficient specifications. We compare the performance of the VCERGM with other competing methods using measurements of bias, variance, and robustness. We further investigate how the VCERGM performs when the networks are observed at unequally spaced time points due to missing networks.

5.1 Power Evaluation for Testing Heterogeneity

We first investigate the power of the hypothesis test for heterogeneity that we introduce in Section 4. To do so, we investigate both Type I and Type II errors of the test on dynamic networks over various magnitudes of temporal heterogeneity. We simulate 100 sequences of dynamic networks $\mathbf{x}=\{\mathbf{x}_{1},\ldots,\mathbf{x}_{100}\}$ , where each sequence $\mathbf{x}_{w}=\{x_{w,1},\ldots,x_{w,K}\}$ , contains $K$ networks with 30 nodes observed at equally-spaced times $t_{1},\ldots,t_{K}$ under the VCERGM that models the temporal contributions of the edge density statistic. We set the coefficient on the edge density term, $\phi(t)$ , to be a sinusoidal curve with amplitude $M$ and period $T$ . In particular, we model

[TABLE]

We vary the number of observed time points $K$ from 10 to 100, and the amplitude $M$ from 0 to 0.3 in increments of 0.05. For each value of $K$ and $M$ , we calculate the proportion of cases that we reject the null hypothesis at a $\alpha=0.05$ level out of the 100 simulated dynamic network sequences. Table 2 reports these proportions when using the bootstrap procedure as well as the chi-squared approximation.

When $M=0$ , $\phi(t)$ is a constant function and as a result the proportion of rejections in this case provides an estimate for the Type I error of each test. From Table 2, we see that both strategies obtain a Type I error at or below 0.05, as desired. For $M>0$ , the proportion of rejections provides an estimate of the power of the test. We see that for higher signal (larger $M$ ) and for a larger number of observed networks (larger $K$ ), we obtain a higher power, as expected. Across $K$ , we see in general that the bootstrap procedure is consistently more powerful than the use of the chi-squared reference distribution for each amplitude value $M$ . For $M>0.25$ the power of both tests reaches 1, indicating that heterogeneity is successfully identified by both tests. These results suggest that both tests are powerful for large enough signal size, and that the bootstrap procedure outperforms the chi-squared approximation for small signal sizes (between $M=0.05$ and $0.20$ ).

5.2 Estimation Performance

We now evaluate the performance of VCERGM to accurately estimate fluctuating parameters $\bm{\phi}(t)$ , $t\in[0,T]$ . We consider four different settings for $\bm{\phi}(t)$ : (i) sinusoidal curve $\phi(t)=a\sin\{(t+b)/c\}+d$ of varying amplitude $a$ ; (ii) quadratic curve $\phi(t)=a(t-T/2)^{2}+b$ of varying strength $a$ ; (iii) dynamic Erdős-Rényi random graph with probability $p$ of edges; and (iv) non-smooth (spiky) functions as a form of a sequence of random numbers with varying mean and standard deviation for normal distribution. For each setting of varying coefficients, we model the occurrence of graphs using the VCERGM with edge density and reciprocity statistics (see Table 1). We simulate 100 dynamic sequences of directed graphs $\{\mathbf{x}_{1},\ldots,\mathbf{x}_{100}\}$ where each sequence $\mathbf{x}_{w}=\{x_{w,1},\ldots,x_{w,50}\}$ is observed at $K=50$ equally-spaced time points. We assume that the network size remains constant through time and consider estimation with networks of three different sizes $n=30,50,100$ . Furthermore, we repeat (i)-(iv) with 1, 5, and 10 randomly chosen networks removed from the time series to evaluate the performance on dynamic networks with observations missing at random.

For each simulated dynamic network, we compare the VCERGM with two other dynamic network models. First, we fit cross-sectional ERGMs, where the ERGM in model (1) is fit separately at each of the $K$ observed time points. As an alternative competitive method, we also develop an ad hoc 2-step procedure, which adapts an ad hoc smoothing procedure after fitting cross-sectional ERGMs for observed networks. To assess the performance of each method, we calculate the integrated absolute error (IAE) of the estimated coefficient curves. It measures the sum of point-wise absolute difference between estimated curve $\hat{\phi}(t)$ and true curve $\phi(t)$ at observed time points $t_{1},\ldots,t_{K}$ , namely

[TABLE]

The mean and standard deviation (SD) are calculated to evaluate the performance of our proposed method compared to cross-sectional ERGMs and ad hoc 2-step procedure. We provide the summary of IAE for each method on dynamic networks with 30 nodes in Table 3 with $(0,1,5,10)$ missing networks. Settings for the results are (i) sinusoidal curves with $(a,b,c,d)=(1,20,5,1)$ (edges) and $(a,b,c,d)=(0.6,20,3,0.4)$ (reciprocity); (ii) quadratic curves with $(a,b)=(1/25^{2},0)$ (edges) and $(a,b)=(-1/30^{2},0.5)$ (reciprocity); (iii) Erdős-Rényi with $p_{edges}=0.85$ ; (iv) a sequence of random numbers from $N(0,1)$ (edges) and $N(1.5,0.6)$ (reciprocity). The performances of cross-sectional ERGMs, ad hoc 2-step procedure, and VCERGM become more comparable with larger network size. For results of $n=50$ and $n=100$ case, see Tables 5 and 6 in supplementary materials.

Without missing network, Figure 3 shows that cross-sectional ERGMs are more likely to introduce unexpected spikes or increased variability in estimating true $\bm{\phi}(t)$ , compared to VCERGM. Overall, the VCERGM presents smaller deviation from true $\bm{\phi}(t)$ with smaller variability compared to cross-sectional ERGMs and ad hoc 2-step procedure. In the first three settings, the VCERGM performs better than cross-sectional ERGMs and ad hoc 2-step procedure with smaller IAE. In case of non-smooth functions, with true $\bm{\phi}(t)$ not smooth but wiggly, the ad hoc 2-step procedure shows better performance than the VCERGM with respect to IAE. Both Table 3 and Figure 3 indicate that the VCERGM potentially misses random wiggles, which causes greater bias on average compared to cross-sectional ERGMs. Regardless, overall trend that the true non-smooth $\bm{\phi}(t)$ presents is fairly well estimated by the VCERGM and the variability of the estimates is still smaller for the VERGM compared to cross-sectional ERGMs. Overall, the performance of ad hoc 2-step procedure and VCERGM is comparable, but the VCERGM is more principled in terms of incorporating time-varying coefficients in the modeling step. For all four settings, the VCERGM is computationally more efficient than the cross-sectional ERGMs. We conduct an additional simulation study specifically tailored to compare the computing time between methods and the results are presented in Table 4.

When there exist missing networks, cross-sectional ERGMs are no longer available to provide the estimates at unobserved time points. Therefore, the IAE is calculated only for ad hoc 2-step procedure and VCERGM. Notably, the performance of the VCERGM remains stable across each number of missing networks. Cross-sectional ERGMs and the 2-step approach, on the other hand, suffer more than the VCERGM in the case of missing networks. Indeed, as shown in Table 3, the VCERGM outperforms these competitive methods in the case that observations are missing and is better able to capture the true coefficient curve in these cases.

In order to compare the computational efficiency, we vary the number of time points $K$ and record the computing time for VCERGM and cross-sectional ERGMs. Table 4 summarizes the computing times of 500 simulated dynamic network sequences and displays how computing time changes as the number of time points $K$ changes. According to Table 4, the VCERGM takes significantly less time than cross-sectional ERGMs to complete the parameter estimation. Even if both VCERGM and cross-sectional ERGMs show linear increase in computing time, the rate of change is much smaller for VCERGM. Both methods entail $K$ separate steps to construct design matrix and response vector at each time point, but the cross-sectional ERGMs require $K$ separate MPLE steps while VCERGM only needs one estimation. In other words, the longer the time series of networks are, the more efficient VCERGM is compared to cross-sectional ERGMs.

6 Applications

In this section we apply the VCERGM to two case studies which portray differing amounts of temporal heterogeneity. First, we analyze how the co-voting patterns among U.S. Senators have changed through time. In this example, we analyze the effects of political affiliation (Republican or Democrat) on the likelihood of the voting networks. In our second example, we investigate the structural changes of the resting-state functional magnetic resonance imaging (rfMRI) records of healthy individuals. For each case study, we first test for temporal heterogeneity of any statistic included in the model. We fit the VCERGM on the two data sets and also fit cross-sectional ERGMs and ad hoc 2-step procedure for comparison. In the first example, we witness a clear evidence of temporal heterogeneity that the importance of political affiliation on the likelihood of the voting networks significantly changes over time. On the contrary, relatively stable rfMRI networks show little fluctuations through time and our method is shown to accommodate it successfully as well.

6.1 Political Voting Network

We begin by analyzing the dynamic network that describes the co-voting patterns among U.S. Senators from 1867 (Congress 40) to 2015 (Congress 113). Three of the voting networks are shown in Figure 1. This network was first investigated in Moody and Mucha [34] and has been subsequently analyzed in Wilson et al. [49]. The network is based off of the roll call voting data from http://voteview.com, which contains the voting decision of each Senator (yay, nay, or abstain) for every bill brought to Congress. We model the co-voting tendencies of the Senators using a dynamic network where nodes represent Senators and an edge is formed between two nodes if the two Senators vote concurrently (both yay or both nay) on at least 75% of the bills to which they were both present.

As shown in Figure 1, there exist great fluctuations in network structure through time. Previous analyses in Moody and Mucha [34], Wilson et al. [49] have identified significant changes in the community structure of the network over time, and that this community structure is closely associated with political affiliation of the Senators. To account for these fluctuations, we incorporate a node-match statistic for political affiliation which counts the number of edges shared between Senators who have the same political affiliation. Furthermore, we include a statistic which models popularity (two-star) and clustering (triangle) of the Senator networks. See Table 1 for more details of these two statistics. We note that it is of separate interest to perform model selection for the VCERGM; however, we pursue this in future work. Here, we compare and contrast the results of a simple model when using several viable dynamic network models.

The estimated parameters from i) cross-sectional ERGMs (ERGM), ii) ad hoc 2-step procedure (ERGM2) and iii) VCERGM are presented in Figure 4. Notably, all three network statistics exhibit temporal heterogeneity. The asymptotic chi-squared $p$ -value for testing heterogeneity is $<.001$ . Consistent with our simulation results, the cross-sectional ERGMs exhibits spiky estimates, but the ad hoc smoothing recovers the lack of smoothness efficiently and produces similar estimates as the VCERGM. The political nodematch parameter estimate reveals an important trend in the political network. We see that the coefficient value for this term generally increases over time, indicating the increasing importance of political affiliation on the co-voting habits of the Senators throughout U.S. history. In particular since Congress 95, this coefficient has significantly increased. This finding matches the current theory of “political polarization” described in [34]. Figure 4 also suggests that the triangle coefficient has decreased in magnitude suggesting that clustering has become less and less important over time.

6.2 fMRI Dataset

We next analyze the structure of brain connectivity in the data provided by the WU-Minn Consortium Human Connectome Project (HCP). The dataset is available at https://db.humanconnectome.org. See Van Essen et al. [47] for an overview of data acquisition and analysis. The dataset includes the resting-state functional magnetic resonance imaging (rfMRI) of 500 subjects. For each subject, a 15-minute run of rfMRI is recorded. We set 47 local windows and calculate a partial precision matrix between 50 brain regions based on observations within each window. For a transition from precision matrices to sequence of dynamic networks, we define the edge density of a network as the proportion of edges in the network. Once the edge density is specified, the threshold of partial precision value can be determined to form an edge between the brain regions. With the edge density of 10%, for example, the greatest 10% of partial precision values would form edges.

References [39] and [40] were among the first to fit ERGMs to functional brain networks. Their final model included network statistics such as geometrically weighted edge-wise shared partner (GWESP) and geometrically weighted non-edge-wise shared partner (GWNSP). For sake of comparison, we model our rfMRI networks with three network statistics: edges, triangle and two-star and compare i) cross-sectional ERGMs (ERGM), ii) ad hoc 2-step procedure (ERGM2) and iii) VCERGM. We note that model selection for the VCERGM (and other dynamic network models in general) is an important area for future research.

Figure 5 shows the results of two individuals from this study. As the data are the resting-state fMRI records, little fluctuation is expected in parameters over time. For both individuals, both ad hoc 2-step procedure and VCERGM provide estimates with a small range of fluctuation for all three network statistics. Overall, the ad hoc 2-step procedure and VCERGM provide relatively similar estimates, while both estimates cross the cross-sectional ERGM estimates. The estimates from cross-sectional ERGMs are extremely jagged that they may introduce inaccurate inference with regard to explaining the topological change in brain networks over time. The VCERGM not only produces fairly static estimates but also captures small variations through time more sensitively than ad hoc 2-step procedure. Therefore, even with relatively stable dynamic networks, the VCERGM performs consistently well.

7 Discussion

In this paper, we introduce varying-coefficient models for dynamic networks. In particular, we described the formulation and estimation of the VCERGM, a model that incorporates temporal changes in the coefficients of an exponential random graph family of models. We demonstrated the advantages of applying the VCERGM over competing methods through simulations and two dynamic network case studies. First, the VCERGM provides an intuitive explanation of how a network changes through time. Both the cross-sectional ERGMs and ad hoc 2-step procedure seemed to capture the temporal heterogeneity in a sense. However, by incorporating the temporal heterogeneity in the modeling step, the VCERGM provides a compact and meaningful model to formally explain the temporal structure of dynamic networks. Second, the VCERGM is robust to perturbations in observed temporal data. By imposing smoothness on the coefficients, we are able to provide robust estimates that are resistant to outliers and noise. Third, the VCERGM enables interpolation for missing networks through time. In practice, one can only observe a finite number of networks in a dynamic sequence, which may be observed in unequally spaced time increments. Estimates of the coefficients to the VCERGM can be evaluated at any time point in the domain and immediately interpreted as the impact of network statistics at that time point. By presenting the results with unequally-spaced networks, we illustrated how the varying-coefficients through time can be useful especially in terms of interpolation.

Our work provides several avenues for future research. First, it is important to consider the evaluation of goodness of fit and model selection in a dynamic context. Through empirical exploration, we found that the network statistics used to fit a model are often highly correlated. For example, if there exists a triangle in a network, it is more likely to find two-stars in the network. Model identifiability should be investigated both in static ERGM models and the VCERGM to ensure appropriate model selection. For static ERGMs, one generally assesses goodness of fit through a comparison of quantitative summaries of simulated networks from the fitted model with the summaries of the observed network [27]. However, for dynamic networks this type of goodness of fit comparison captures only the marginal aspects of the dynamic sequence. How exactly to assess the quality of a dynamic model is still an open problem. A second avenue for future work involves adapting the varying-coefficient framework introduced here to networks with weighted edges. To do this, one can extend the exponential models of networks for integer-valued weights from [31] or to the models of networks for continuous-valued weights considered in [9, 50].

In many dynamic networks, it is often of interest to identify change-points in the network, namely points in time where the network undergoes significant local or global structural change [51, 1]. It would be interesting to further analyze how to utilize dynamic network models like the VCERGM to identify such changes. The test for heterogeneity that we use in the paper may provide some idea of how to formally test for a change - through identification of a change in network parameter - however, in future research we plan to purse this idea further.

Finally, we will investigate adapting a varying-coefficient model for networks with a Markov dependency. In this paper, we investigated the dynamics of coefficients for marginal network statistics. However, this model can readily be extended to networks with a Markov dependency like that described by the first order TERGM. In general, for any non-negative integer $q$ , one can model higher order dependencies between $q$ successive networks. For example if we assume the one-step Markov dependence in (2), we can model the one step transition between $X_{t-1}$ and $X_{t}$ using a suite of statistics $\mathbf{g}_{1}(x_{t},x_{t-1})$ as

[TABLE]

In model (13), $\bm{\phi}_{1}(t)=\{\phi_{1k}(t),k=1,\ldots,p\}$ can be modeled as smooth functions that describe the impact of the one-step transition statistics from $x_{t-1}$ to $x_{t}$ . Therefore, model (13) effectively captures the rate of change of the temporal potential between sequential graphs rather than the rate of change of the marginal features as done in this work. Like the TERGM, we can generalize the VCERGM to a higher order Markov dependency, say order $q>1$ , by specifying appropriate transition statistics $\mathbf{g}_{q}(x_{t},x_{t-1},\ldots,x_{t-q})$ .

Acknowledgements

The fMRI data were provided [in part] by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.

Bibliography52

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bindu and Thilagam [2016] PV Bindu and P Santhi Thilagam. Mining social networks for anomalies: Methods and challenges. Journal of Network and Computer Applications , 68:213–229, 2016.
2Boucheron and Massart [2011] Stéphane Boucheron and Pascal Massart. A high-dimensional wilks phenomenon. Probability theory and related fields , 150(3-4):405–433, 2011.
3Brockwell and Davis [2013] Peter J Brockwell and Richard A Davis. Time series: theory and methods . Springer Science & Business Media, 2013.
4Cai et al. [2000] Zongwu Cai, Jianqing Fan, and Runze Li. Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association , 95(451):888–902, 2000.
5Cranmer and Desmarais [2011] Skyler J. Cranmer and A. Desmarais, Bruce. Inferential network analysis with exponential random graph models. Political Analysis , 19(1):66–86, 2011.
6Daubechies et al. [1992] Ingrid Daubechies et al. Ten lectures on wavelets , volume 61. SIAM, 1992.
7De Boor et al. [1978] Carl De Boor, Carl De Boor, Etats-Unis Mathématicien, Carl De Boor, and Carl De Boor. A practical guide to splines , volume 27. Springer-Verlag New York, 1978.
8De Brabanter et al. [2006] Joseph De Brabanter, Kristiaan Pelckmans, JAK Suykens, and Bart De Moor. Generalized likelihood ratio statistics based on bootstrap techniques for autoregressive models. IFAC Proceedings Volumes , 39(1):790–795, 2006.