Consistency of the maximum likelihood and variational estimators in a   dynamic stochastic block model

L\'ea Longepierre (LPSM UMR 8001); Catherine Matias (LPSM UMR 8001)

arXiv:1903.04306·math.ST·August 30, 2023

Consistency of the maximum likelihood and variational estimators in a dynamic stochastic block model

L\'ea Longepierre (LPSM UMR 8001), Catherine Matias (LPSM UMR 8001)

PDF

Open Access

TL;DR

This paper proves the consistency and convergence rates of maximum likelihood and variational estimators in a dynamic stochastic block model with evolving node memberships modeled by a hidden Markov chain.

Contribution

It establishes the theoretical consistency and convergence rates of estimators in a dynamic stochastic block model with temporal evolution of node classes.

Findings

01

Proves consistency of estimators as nodes and time steps increase

02

Provides upper bounds on convergence rates of estimators

03

Analyzes a case with fixed time steps and varying connectivity parameters

Abstract

We consider a dynamic version of the stochastic block model, in which the nodes are partitioned into latent classes and the connection between two nodes is drawn from a Bernoulli distribution depending on the classes of these two nodes. The temporal evolution is modeled through a hidden Markov chain on the nodes memberships. We prove the consistency (as the number of nodes and time steps increase) of the maximum likelihood and variational estimators of the model parameters, and obtain upper bounds on the rates of convergence of these estimators. We also explore the particular case where the number of time steps is fixed and connectivity parameters are allowed to vary.

Equations673

P (Z_{i}^{t + 1} = l ∣ Z_{i}^{t} = q) = γ_{q l}, \forall1 \leq q, l \leq Q

P (Z_{i}^{t + 1} = l ∣ Z_{i}^{t} = q) = γ_{q l}, \forall1 \leq q, l \leq Q

P_{θ} (Z_{i}) = α_{Z_{i}^{1}} t = 1 \prod T - 1 γ_{Z_{i}^{t} Z_{i}^{t + 1}} .

P_{θ} (Z_{i}) = α_{Z_{i}^{1}} t = 1 \prod T - 1 γ_{Z_{i}^{t} Z_{i}^{t + 1}} .

X_{ij}^{t} ∣ Z_{i}^{t} = q, Z_{j}^{t} = l \sim B (π_{q l})

X_{ij}^{t} ∣ Z_{i}^{t} = q, Z_{j}^{t} = l \sim B (π_{q l})

N_{q} (z^{t}) = ∣ {i \in [[1, n]]; z_{i}^{t} = q} ∣ and N_{q l} (z^{1 : T}) = t = 1 \sum T - 1 i = 1 \sum n \mathds 1_{z_{i}^{t} = q, z_{i}^{t + 1} = l} .

N_{q} (z^{t}) = ∣ {i \in [[1, n]]; z_{i}^{t} = q} ∣ and N_{q l} (z^{1 : T}) = t = 1 \sum T - 1 i = 1 \sum n \mathds 1_{z_{i}^{t} = q, z_{i}^{t + 1} = l} .

∥ π - π^{'} ∥_{\infty} = 1 \leq q, l \leq Q max ∣ π_{q l} - π_{q l}^{'} ∣ and ∥Γ - Γ^{'} ∥_{\infty} = 1 \leq q, l \leq Q max ∣ γ_{q l} - γ_{q l}^{'} ∣.

∥ π - π^{'} ∥_{\infty} = 1 \leq q, l \leq Q max ∣ π_{q l} - π_{q l}^{'} ∣ and ∥Γ - Γ^{'} ∥_{\infty} = 1 \leq q, l \leq Q max ∣ γ_{q l} - γ_{q l}^{'} ∣.

θ_{σ} = (Γ_{σ}, π_{σ}) = ((γ_{σ (q) σ (l)})_{1 \leq q, l \leq Q}, (π_{σ (q) σ (l)})_{1 \leq q, l \leq Q}) .

θ_{σ} = (Γ_{σ}, π_{σ}) = ((γ_{σ (q) σ (l)})_{1 \leq q, l \leq Q}, (π_{σ (q) σ (l)})_{1 \leq q, l \leq Q}) .

∥ π^{1 : T} - π^{'1 : T} ∥_{\infty} = (q, l, t) \in [[1, Q]]^{2} \times [[1, T]] max ∣ π_{q l}^{t} - π_{q l}^{' t} ∣.

∥ π^{1 : T} - π^{'1 : T} ∥_{\infty} = (q, l, t) \in [[1, Q]]^{2} \times [[1, T]] max ∣ π_{q l}^{t} - π_{q l}^{' t} ∣.

ℓ_{c} (θ; Z^{1 : T})

ℓ_{c} (θ; Z^{1 : T})

and ℓ (θ)

\hat{θ} = (\hat{Γ}, \overset{π}{^}) = θ \in Θ argmax ℓ (θ) .

\hat{θ} = (\hat{Γ}, \overset{π}{^}) = θ \in Θ argmax ℓ (θ) .

M_{n, T} (Γ, π) = \frac{2}{n ( n - 1 ) T} ℓ (θ) = \frac{2}{n ( n - 1 ) T} lo g P_{θ} (X^{1 : T})

M_{n, T} (Γ, π) = \frac{2}{n ( n - 1 ) T} ℓ (θ) = \frac{2}{n ( n - 1 ) T} lo g P_{θ} (X^{1 : T})

M (π, A)

M (π, A)

and M (π)

P_{θ^{*}} ((Γ, π) \in Θ sup ∣ M_{n, T} (Γ, π) - M (π) ∣ > \frac{ϵ r _{n, T}}{n}) n, T \to + \infty 0.

P_{θ^{*}} ((Γ, π) \in Θ sup ∣ M_{n, T} (Γ, π) - M (π) ∣ > \frac{ϵ r _{n, T}}{n}) n, T \to + \infty 0.

P_{θ^{*}} (σ \in S_{Q} min ∥ π^{*} - \overset{π}{^}_{σ} ∥_{\infty} > \frac{ϵ r _{n, T}}{n ^{1/4}}) n, T \to \infty 0.

P_{θ^{*}} (σ \in S_{Q} min ∥ π^{*} - \overset{π}{^}_{σ} ∥_{\infty} > \frac{ϵ r _{n, T}}{n ^{1/4}}) n, T \to \infty 0.

P_{θ^{*}} (σ \in S_{Q} min ∥ π^{* 1 : T} - \overset{π}{^}_{σ}^{1 : T} ∥_{\infty} > \frac{ϵ r _{n}}{n ^{1/4}}) n \to \infty 0,

P_{θ^{*}} (σ \in S_{Q} min ∥ π^{* 1 : T} - \overset{π}{^}_{σ}^{1 : T} ∥_{\infty} > \frac{ϵ r _{n}}{n ^{1/4}}) n \to \infty 0,

\forall (q, l) \in [[1, Q]]^{2}, \overset{γ}{˘}_{q l} = \frac{\sum _{t = 1}^{T - 1} \sum _{i = 1}^{n} P _{\overset{˘}{θ}} ( Z _{i}^{t} = q , Z _{i}^{t + 1} = l ∣ X ^{1 : T} )}{\sum _{t = 1}^{T - 1} \sum _{i = 1}^{n} P _{\overset{˘}{θ}} ( Z _{i}^{t} = q ∣ X ^{1 : T} )} .

\forall (q, l) \in [[1, Q]]^{2}, \overset{γ}{˘}_{q l} = \frac{\sum _{t = 1}^{T - 1} \sum _{i = 1}^{n} P _{\overset{˘}{θ}} ( Z _{i}^{t} = q , Z _{i}^{t + 1} = l ∣ X ^{1 : T} )}{\sum _{t = 1}^{T - 1} \sum _{i = 1}^{n} P _{\overset{˘}{θ}} ( Z _{i}^{t} = q ∣ X ^{1 : T} )} .

E (z^{1 : T}, θ, ϵ) : = {\frac{P _{θ} ( Z ^{1 : T} \neq = z ^{1 : T} ∣ X ^{1 : T} )}{P _{θ} ( Z ^{1 : T} = z ^{1 : T} ∣ X ^{1 : T} )} > ϵ} .

E (z^{1 : T}, θ, ϵ) : = {\frac{P _{θ} ( Z ^{1 : T} \neq = z ^{1 : T} ∣ X ^{1 : T} )}{P _{θ} ( Z ^{1 : T} = z ^{1 : T} ∣ X ^{1 : T} )} > ϵ} .

\mathbb{P}_{\theta^{*}}\left(\mathcal{E}(Z^{1:T},\breve{\theta},\epsilon y_{n,T})\right)\leq QT\exp(-2\eta^{2}n)+\mathbb{P}_{\theta^{*}}\left(\|\breve{\pi}-\pi^{*}\|_{\infty}>v_{n,T}\right)\\ +CnT\left\{\exp\Bigg{[}-(\delta-\eta)^{2}C_{1}n+C_{2}\log(nT)-C_{4}\log(\epsilon y_{n,T})\Bigg{]}+\exp\Bigg{[}-C_{3}\frac{(\log(nT))^{2}}{nv_{n,T}^{2}}+3n\log(nT)\Bigg{]}\right\},

\mathbb{P}_{\theta^{*}}\left(\mathcal{E}(Z^{1:T},\breve{\theta},\epsilon y_{n,T})\right)\leq QT\exp(-2\eta^{2}n)+\mathbb{P}_{\theta^{*}}\left(\|\breve{\pi}-\pi^{*}\|_{\infty}>v_{n,T}\right)\\ +CnT\left\{\exp\Bigg{[}-(\delta-\eta)^{2}C_{1}n+C_{2}\log(nT)-C_{4}\log(\epsilon y_{n,T})\Bigg{]}+\exp\Bigg{[}-C_{3}\frac{(\log(nT))^{2}}{nv_{n,T}^{2}}+3n\log(nT)\Bigg{]}\right\},

P_{θ^{*}} (∥ \hat{Γ}_{σ} - Γ^{*} ∥_{\infty} > ϵ r_{n, T} \frac{lo g n}{n T}) \leq Q^{2} (3 Q + 1) P_{θ^{*}} (∥ \overset{π}{^}_{σ} - π^{*} ∥_{\infty} > v_{n, T}) + o (1)

P_{θ^{*}} (∥ \hat{Γ}_{σ} - Γ^{*} ∥_{\infty} > ϵ r_{n, T} \frac{lo g n}{n T}) \leq Q^{2} (3 Q + 1) P_{θ^{*}} (∥ \overset{π}{^}_{σ} - π^{*} ∥_{\infty} > v_{n, T}) + o (1)

P_{θ^{*}} (σ \in S_{Q} min ∥ \hat{Γ}_{σ} - Γ^{*} ∥_{\infty} > ϵ r_{n, T} \frac{lo g n}{n T}) n, T \to \infty 0.

P_{θ^{*}} (σ \in S_{Q} min ∥ \hat{Γ}_{σ} - Γ^{*} ∥_{\infty} > ϵ r_{n, T} \frac{lo g n}{n T}) n, T \to \infty 0.

\mathbb{P}_{\theta^{*}}\left(\mathcal{E}(Z^{1:T},\breve{\theta},\epsilon y_{n})\right)\leq QT\exp(-2\eta^{2}n)+\mathbb{P}_{\theta^{*}}\left(\|\breve{\pi}^{1:T}-\pi^{*1:T}\|_{\infty}>v_{n}\right)\\ +CnT\left\{\exp\Bigg{[}-(\delta-\eta)^{2}C_{1}n+C_{2}\log(nT)-C_{4}\log(\epsilon y_{n})\Bigg{]}+\exp\Bigg{[}-C_{3}\frac{(\log(nT))^{2}}{nv_{n}^{2}}+5n\log(nT)\Bigg{]}\right\},

\mathbb{P}_{\theta^{*}}\left(\mathcal{E}(Z^{1:T},\breve{\theta},\epsilon y_{n})\right)\leq QT\exp(-2\eta^{2}n)+\mathbb{P}_{\theta^{*}}\left(\|\breve{\pi}^{1:T}-\pi^{*1:T}\|_{\infty}>v_{n}\right)\\ +CnT\left\{\exp\Bigg{[}-(\delta-\eta)^{2}C_{1}n+C_{2}\log(nT)-C_{4}\log(\epsilon y_{n})\Bigg{]}+\exp\Bigg{[}-C_{3}\frac{(\log(nT))^{2}}{nv_{n}^{2}}+5n\log(nT)\Bigg{]}\right\},

P_{θ^{*}} (σ \in S_{Q} min ∥ \hat{Γ}_{σ} - Γ^{*} ∥_{\infty} > ϵ r_{n} \frac{lo g n}{n}) n \to \infty 0.

P_{θ^{*}} (σ \in S_{Q} min ∥ \hat{Γ}_{σ} - Γ^{*} ∥_{\infty} > ϵ r_{n} \frac{lo g n}{n}) n \to \infty 0.

Q_{χ} (Z^{1 : T}) = i = 1 \prod n Q_{χ} (Z_{i}^{1}) t = 2 \prod T Q_{χ} (Z_{i}^{t} ∣ Z_{i}^{t - 1}) = i = 1 \prod n ⎩ ⎨ ⎧ [q = 1 \prod Q (τ_{i q}^{1})^{Z_{i q}^{1}}] t = 1 \prod T - 1 1 \leq q, l \leq Q \prod (\frac{η _{i q l}^{t}}{τ _{i q}^{t}})^{Z_{i q}^{t} Z_{i l}^{t + 1}} ⎭ ⎬ ⎫,

Q_{χ} (Z^{1 : T}) = i = 1 \prod n Q_{χ} (Z_{i}^{1}) t = 2 \prod T Q_{χ} (Z_{i}^{t} ∣ Z_{i}^{t - 1}) = i = 1 \prod n ⎩ ⎨ ⎧ [q = 1 \prod Q (τ_{i q}^{1})^{Z_{i q}^{1}}] t = 1 \prod T - 1 1 \leq q, l \leq Q \prod (\frac{η _{i q l}^{t}}{τ _{i q}^{t}})^{Z_{i q}^{t} Z_{i l}^{t + 1}} ⎭ ⎬ ⎫,

J (χ, θ) = ℓ (θ) - K L (Q_{χ}, P_{θ} (\cdot ∣ X^{1 : T})) = E_{Q_{χ}} [lo g P_{θ} (X^{1 : T}, Z^{1 : T})] + H (Q_{χ})

J (χ, θ) = ℓ (θ) - K L (Q_{χ}, P_{θ} (\cdot ∣ X^{1 : T})) = E_{Q_{χ}} [lo g P_{θ} (X^{1 : T}, Z^{1 : T})] + H (Q_{χ})

\overset{χ}{^} (θ) = (\overset{τ}{^} (θ), \overset{η}{^} (θ)) = χ \in [0, 1]^{T^{2} n^{2} Q^{3}} argmax J (χ, θ),

\overset{χ}{^} (θ) = (\overset{τ}{^} (θ), \overset{η}{^} (θ)) = χ \in [0, 1]^{T^{2} n^{2} Q^{3}} argmax J (χ, θ),

\tilde{θ} = (\tilde{Γ}, \tilde{π}) = θ \in Θ argmax J (\overset{χ}{^} (θ), θ) .

\tilde{θ} = (\tilde{Γ}, \tilde{π}) = θ \in Θ argmax J (\overset{χ}{^} (θ), θ) .

P_{θ^{*}} (θ \in Θ sup \frac{2}{n ( n - 1 ) T} J (\overset{χ}{^} (θ), θ) - M (π) > \frac{ϵ r _{n, T}}{n}) ⟶_{n, T \to + \infty} 0.

P_{θ^{*}} (θ \in Θ sup \frac{2}{n ( n - 1 ) T} J (\overset{χ}{^} (θ), θ) - M (π) > \frac{ϵ r _{n, T}}{n}) ⟶_{n, T \to + \infty} 0.

\frac{1}{2} P_{θ^{*}} (σ \in S_{Q} min ∥ \tilde{π}_{σ} - π^{*} ∥_{\infty} > \frac{ϵ r _{n, T}}{n ^{1/4}}) n, T \to \infty 0.

\frac{1}{2} P_{θ^{*}} (σ \in S_{Q} min ∥ \tilde{π}_{σ} - π^{*} ∥_{\infty} > \frac{ϵ r _{n, T}}{n ^{1/4}}) n, T \to \infty 0.

\frac{1}{2} P_{θ^{*}} (σ \in S_{Q} min ∥ \tilde{π}_{σ}^{1 : T} - π^{* 1 : T} ∥_{\infty} > \frac{ϵ r _{n}}{n ^{1/4}}) n \to \infty 0.

\frac{1}{2} P_{θ^{*}} (σ \in S_{Q} min ∥ \tilde{π}_{σ}^{1 : T} - π^{* 1 : T} ∥_{\infty} > \frac{ϵ r _{n}}{n ^{1/4}}) n \to \infty 0.

\forall (q, l) \in [[1, Q]]^{2}, \overset{γ}{˘}_{q l} = \frac{\sum _{i = 1}^{n} \sum _{t = 1}^{T - 1} η ˘ _{i q l}^{t}}{\sum _{i = 1}^{n} \sum _{t = 1}^{T - 1} τ ˘ _{i q}^{t}} .

\forall (q, l) \in [[1, Q]]^{2}, \overset{γ}{˘}_{q l} = \frac{\sum _{i = 1}^{n} \sum _{t = 1}^{T - 1} η ˘ _{i q l}^{t}}{\sum _{i = 1}^{n} \sum _{t = 1}^{T - 1} τ ˘ _{i q}^{t}} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and statistical mechanics · Bayesian Methods and Mixture Models · Random Matrices and Applications

Full text

Consistency of the maximum likelihood and variational estimators in a dynamic stochastic block model

Léa Longepierre and Catherine Matias

Sorbonne Université, Université Paris Diderot, Centre National de la Recherche Scientifique,

Laboratoire de Probabilités, Statistique et Modélisation,

4 place Jussieu, 75252 PARIS Cedex 05, FRANCE.

{lea.longepierre,catherine.matias}@sorbonne-universite.fr

Abstract

We consider a dynamic version of the stochastic block model, in which the nodes are partitioned into latent classes and the connection between two nodes is drawn from a Bernoulli distribution depending on the classes of these two nodes. The temporal evolution is modeled through a hidden Markov chain on the nodes memberships. We prove the consistency (as the number of nodes and time steps increase) of the maximum likelihood and variational estimators of the model parameters, and obtain upper bounds on the rates of convergence of these estimators. We also explore the particular case where the number of time steps is fixed and connectivity parameters are allowed to vary.

Keywords: maximum likelihood estimation, dynamic network, dynamic stochastic block model, variational estimation, temporal network

1 Introduction

Random graphs are a suitable tool to model and describe interactions in many kinds of datasets such as biological, ecological, social or transport networks. Here we are interested in time-evolving networks, which is a powerful tool for modeling real-world phenomena, where the role or behaviour of the nodes in the network and the relationships between them are allowed to change over time. Indeed, it is important to take into account the evolutionary behaviour of the graphs, instead of just studying separate snapshots as static graphs. We focus on graphs evolving in discrete time and refer to Holme (2015) for an introduction to dynamic networks.

A myriad of dynamic graph models has been introduced in the past few years, see for instance Zhang et al. (2017). We focus here on those which are based on the (static) stochastic block model (SBM, Holland et al., 1983) in which the nodes are partitioned into classes. In the SBM, class memberships of the nodes are represented by latent variables and the connection between two nodes is drawn from a distribution depending on the classes of these two nodes (a Bernoulli distribution in the case of binary graphs). A first dynamic version of the SBM with discrete time is proposed in Yang et al. (2011). There, the nodes are partitioned into $Q$ classes and the graphs are binary or weighted. The nodes are allowed to change membership over time, and these changes are governed by independent Markov chains with values in the $Q$ classes, while the connection probabilities are constant over time. Xu and Hero (2014) introduce a state-space model on the logit of the connection probabilities for dynamic (binary) networks with connection probabilities and group memberships varying over time. Unfortunately, their model presents parameter identifiability issues (Matias and Miele, 2017). Xu (2015) proposes a stochastic block transition model in which the presence or absence of an edge between two nodes at a particular time affects the presence or absence of such an edge at a future time. There, the nodes can change classes over time, new nodes can enter the network, and the connection probabilities are allowed to vary over time. The model in Matias and Miele (2017) and in Becker and Holzmann (2018) is quite similar to that of Yang et al. (2011) except that it allows the connection probabilities to vary and the latter is moreover nonparametric. Bartolucci et al. (2018) extend the model of Yang et al. (2011) to deal with different forms of reciprocity in directed graphs, by directly modeling dyadic relations and with the assumption that the dyads are conditionally independent given the latent variables. Paul and Chen (2016) and Han et al. (2015) study multi-graph SBM, arising in settings including dynamic networks and multi-layer networks where each layer corresponds to a type of edge. In these two models, the nodes memberships stay constant over the layers. Pensky (2019); Pensky et al. (2019) study a dynamic SBM for undirected and binary edges where both connection probabilities and group memberships vary over time, assuming that the connection probabilities between groups are a smooth function of time. Xing et al. (2010) and Ho et al. (2011) introduce dynamic versions of the mixed-membership stochastic block model, allowing each actor to carry out different roles when interacting with different peers. Zreik et al. (2016) introduce the dynamic random subgraph model, given a known decomposition of the graph into subgraphs, in which the latent class membership depends on the subgraph membership and the edges are categorical variables, their types being sampled from a distribution depending on the latent classes of the two nodes. There, a state-space model is used to characterize the temporal evolution of the latent classes proportions.

As far as estimation is concerned, different methods of inference are proposed to estimate groups and model parameters. The maximum likelihood estimator (MLE) is not tractable in the SBM, thus neither in its dynamic versions. Variational methods are rather popular to approximate that MLE (Xing et al., 2010; Ho et al., 2011; Han et al., 2015; Paul and Chen, 2016; Zreik et al., 2016; Matias and Miele, 2017; Bartolucci et al., 2018). Yang et al. (2011) rely on Gibbs sampling and simulated annealing. Pensky et al. (2019) propose an estimator of the connection probabilities matrix at each time step by a discrete kernel-type method and obtain a clustering of the nodes thanks to spectral clustering on this estimated matrix. They also give an estimator for the number of clusters. Spectral clustering algorithms are also used by Han et al. (2015) on the mean graph over time and by Liu et al. (2018) who use eigenvector smoothing to get some similarity across time periods (and allow the number of classes to be unknown and possibly varying over time).

Some theoretical results on the convergence of the procedures have been proven, mainly for static graphs. In the static SBM, Celisse et al. (2012) prove the consistency of the MLE and variational estimates as the number of nodes increases, and Bickel et al. (2013) establish their asymptotic normality. Mariadassou and Matias (2015) have a different approach and give sufficient conditions for the groups posterior distribution to converge to a Dirac mass located at the actual groups configuration, for every parameter in a neighborhood of the true one. Rohe et al. (2011) give asymptotic results on the normalized graph Laplacian and its eigenvectors for the spectral clustering algorithm, allowing the number of clusters to grow with the number of nodes. They also provide bounds on the number of misclustered nodes, requiring an assumption on the degree distribution. Lei and Rinaldo (2015) prove consistency for the recovery of communities in the spectral clustering on the adjacency matrix, with milder conditions on the degrees, and also extend this result to degree corrected stochastic block models. Klopp et al. (2017) derive oracle inequalities for the connection probabilities estimator and obtain minimax estimation rates, including the sparse case where the density of edges converges to zero as the number of nodes increase thus extending previous results of Gao et al. (2015). Gaucher and Klopp (2019) propose a bound on the risk of the maximum likelihood estimator of network connection probabilities, and show that it is minimax optimal in the sparse graphon model.

In the dynamic setting, fewer theoretical results have been established. Pensky (2019) derives a penalized least squares estimator of the connection probabilities adaptive to the number of blocks and which does not require knowledge of the number of classes $Q$ . She shows that it satisfies an oracle inequality. Under the additional assumption that at most $n_{0}$ nodes change groups between two time steps, this estimator attains minimax lower bounds for the risk. She also introduces a dynamic graphon model and shows that the estimators (that do not require knowledge of a degree of smoothness of the graphon function) are minimax optimal within a logarithmic factor of the number of time steps. Based on the same dynamic SBM with at most $n_{0}$ nodes changing groups between two time steps, Pensky et al. (2019) give an upper bound for the (non asymptotic) error of their estimators of the connection probabilities matrix and group memberships (and also an estimator for the number of clusters). Han et al. (2015) show consistency (as the number of time steps increases but the number of nodes is fixed) of two estimators of the class memberships for dynamic SBM (and more generally multi-graph SBM) in which the nodes memberships are constant over time but the connection probabilities are allowed to vary and the considered graphs are binary and symmetric. They show that the spectral clustering (on the mean graph over time) estimator of the class memberships is consistent under some stationarity and ergodicity conditions on the connection probabilities. They also prove that the MLE of the class memberships is consistent (i.e. that the fraction of misclustered nodes converges to [math]) in the general case (without any structure on the connection probabilities), provided certain sufficient conditions are satisfied. In their multi-layer model, Paul and Chen (2016) give minimax rates of misclassification under certain conditions on the growth of the types of relations, number of nodes and number of classes, extending the result of Han et al. (2015).

Here, we consider a dynamic version of the binary SBM as in Yang et al. (2011), where each node is allowed to change group membership at each time step according to a Markov chain, independently of other nodes. We prove the consistency of the connectivity parameter MLE and, under some additional conditions, of the transition matrix MLE, when the number of nodes and of time steps are increasing. We also give upper bounds on the rates of convergence of these estimators. While these upper bounds are known to be non optimal in the static case where asymptotic normality is obtained with classical parametric rates of convergence (Bickel et al., 2013), these are the first to be established in a dynamic setting for the MLE. As already mentioned, the log-likelihood is intractable (except for very small values of the number of nodes $n$ and the number of time steps $T$ ), as it requires to sum over $Q^{nT}$ terms. Thus, while its consistency remains an important result, the estimator cannot be computed. A possible alternative is to rely on a variational estimator to approximate the MLE (see for instance Matias and Miele, 2017). We also establish the consistency of the variational estimator of the connectivity parameter and under some additional assumptions, that of the variational estimator of the transition matrix and obtain the same upper bounds on the rates of convergence as for the MLE. In the particular case where the number of time steps $T$ is fixed, we also consider the model of Matias and Miele (2017), in which the connection probabilities are allowed to vary over time and generalise these results with only the number of nodes increasing. When $T=1$ , we not only recover the results of Celisse et al. (2012) but extend these by giving rates of convergence. Unlike the model studied in Han et al. (2015) and Paul and Chen (2016), the node memberships in our model evolve over time. Our context is different from Pensky (2019) that focuses on least squares estimate.

This article is organized as follows. Section 2 introduces our model and notation. More precisely, Section 2.1 describes the dynamic stochastic block model as introduced in Yang et al. (2011), Section 2.2 gives the assumptions we make on the model parameters, Section 2.3 describes the dynamic stochastic block model as in Matias and Miele (2017) for the finite time case and Section 2.4 states the expression of the likelihood of this model to define the MLE. Section 3 establishes the consistency and upper bounds of the rates of convergence for the MLE of the connection probabilities in Section 3.1 and of the transition matrix in Section 3.2. Section 4 is dedicated to variational estimators: Section 4.1 and 4.2 establish the consistency of the variational estimators of the connection probabilities and transition matrix, respectively, along with upper bounds of the associated rates of convergence. All the proofs of the main results are postponed to Section 5, except those for the fixed $T$ case that are in Appendix A, while the more technical proofs are deferred to Appendix B.

2 Model and notation

2.1 Dynamic stochastic block model

We consider a set of $n$ vertices, forming a sequence of binary undirected graphs with no self-loops at each time $t=1,\ldots,T$ . The case of a set of directed graphs, with or without self-loops, may be handled similarly. These vertices are assumed to be split into $Q$ latent classes, and we denote by $Z^{t}_{i}$ the label of the $i$ -th vertex at time $t$ . Letting $Z_{i}=(Z_{i}^{1},\dots,Z_{i}^{T})$ , we assume that the $\{Z_{i}\}_{1\leq i\leq n}$ are independent and identically distributed (iid) and each $Z_{i}$ is a homogeneous and stationary Markov chain with transition probabilities

[TABLE]

where $\Gamma=(\gamma_{ql})_{1\leq q,l\leq Q}$ is a stochastic matrix, i.e. with nonnegative coefficients and with each row summing to 1. We let $\alpha=(\alpha_{1},\ldots,\alpha_{Q})$ the stationary distribution of the Markov chain. For any $i\in\llbracket 1,n\rrbracket$ , the probability distribution of $Z_{i}$ is then

[TABLE]

We will also denote $Z^{t}=(Z^{t}_{1},\dots,Z^{t}_{n})$ and $Z^{1:T}=(Z^{1},\dots,Z^{T})=(Z^{t}_{i})_{1\leq t\leq T,1\leq i\leq n}$ .

Consider $X^{t}=\{X^{t}_{ij}\}_{1\leq i,j\leq n}$ the symmetric binary adjacency matrix of the graph at time $t$ such that for every nodes $1\leq i,j\leq n$ , we have $X^{t}_{ii}=0$ and $X^{t}_{ij}=X^{t}_{ji}$ . Each $X^{t}$ follows a stochastic block model so that, conditional on the latent groups $\{Z^{t}_{i}\}_{1\leq i\leq n}$ , the $\{X^{t}_{ij}\}_{1\leq i,j\leq n}$ are independent Bernoulli random variables

[TABLE]

where $(\pi_{ql})_{1\leq q,l\leq Q}\in[0,1]^{Q^{2}}$ are the connectivity parameters. More precisely, conditional on the whole sequence of latent groups $\{Z^{t}_{i}\}_{1\leq t\leq T,1\leq i\leq n}$ , the graphs $X^{1:T}=X^{1},\dots,X^{T}$ are assumed to be independent, each $X^{t}$ having a distribution depending only on $\{Z^{t}_{i}\}_{1\leq i\leq n}$ . The model is thus parameterized by $\theta=(\Gamma,\pi)$ , with $\Gamma=(\gamma_{ql})_{1\leq q,l\leq Q}$ and $\pi=(\pi_{ql})_{1\leq q,l\leq Q}$ . Note that $\pi$ is a symmetric matrix in the undirected setup. We denote by $\mathbb{P}_{\theta}$ (resp. $\mathbb{E}_{\theta}$ ) the probability distribution (resp. expectation) of all the random variables $\{Z_{i}^{t},X^{t}_{ij}\}_{t\geq 1;i,j\geq 1}$ , under the parameter value $\theta$ . In the following, we assume that we observe $\{X^{t}_{ij}\}_{1\leq i,j,\leq n,\;1\leq t\leq T}$ and we denote by $\theta^{*}=(\Gamma^{*},\pi^{*})=((\gamma^{*}_{ql})_{1\leq q,l\leq Q},(\pi^{*}_{ql})_{1\leq q,l\leq Q})$ the true parameter value, with corresponding probability distribution $\mathbb{P}_{\theta^{*}}$ and expectation $\mathbb{E}_{\theta^{*}}$ , and by $\alpha^{*}=(\alpha_{q}^{*})_{1\leq q\leq Q}$ the (true) stationary distribution corresponding to the transition matrix $\Gamma^{*}$ . We also let $\mathds{1}_{A}$ denote the indicator function of the set $A$ and $A^{c}$ the complementary set of $A$ in the ambient set. For any integer $M\geq 1$ , the set $\llbracket 1,M\rrbracket$ is the set of integers between $1$ and $M$ . For any finite set $A$ , let $|A|$ denote its cardinality. For any configuration $z^{1:T}$ , we denote $N_{q}(z^{t})$ (resp. $N_{ql}(z^{1:T})$ ) the number of nodes assigned to class $q$ by the configuration $z^{t}$ (resp. the number of transitions from class $q$ to class $l$ in configuration $z^{1:T}$ ), that is

[TABLE]

We also define for any two parameters $\theta=(\Gamma,\pi)$ and $\theta^{\prime}=(\Gamma^{\prime},\pi^{\prime})$ the following distances

[TABLE]

2.2 Assumptions

The assumptions we make on the model parameters are the following.

For every $1\leq q\neq q^{\prime}\leq Q$ , there exists some $l\in\llbracket 1,Q\rrbracket$ such that $\pi_{ql}\neq\pi_{q^{\prime}l}$ . 2. 2.

There exists some $0<\delta<1/Q$ such that for any $(q,l)\in\llbracket 1,Q\rrbracket^{2}$ , we have $\gamma_{ql}\in[\delta,1-\delta]$ . 3. 3.

There exists some $\zeta>0$ such that for any $(q,l)\in\llbracket 1,Q\rrbracket^{2}$ , we have $\pi_{ql}\in[\zeta,1-\zeta]$ .

Assumption 1 is necessary for identifiability of the model. Indeed, if it does not hold, we cannot distinguish between classes $q$ and $q^{\prime}$ . Assumption 2 ensures that each Markov chain $Z_{i}$ is irreducible, aperiodic and recurrent. This assumption could be weakened at the cost of technicalities. In particular, it implies that the stationary distribution $\alpha$ exists. Moreover, Assumption 2 also implies that for any $q\in\llbracket 1,Q\rrbracket$ , we have $\alpha_{q}\in[\delta,1-\delta]$ . Note that this can be seen as an equivalent of Assumption 2 in Celisse et al. (2012) (on the probability distribution of the class memberships) in the dynamic case. Celisse et al. (2012) however also have an additional assumption that is an empirical version of this assumption (which states that the observed class proportions are bounded away from [math]) that is true with high probability. We do not make such an assumption and use the fact that the probability of this event converges to $1$ . Assumption 3 is technical and could also be weakened with additional technicalities. For example, Celisse et al. (2012) also consider the case $\pi_{ql}\in\{0,1\}$ (i.e. $\pi_{ql}\in\{0,1\}\cup[\zeta,1-\zeta]$ ) whereas we do not. The whole parameter set defined by these constraints is denoted by $\Theta$ . In the following, we assume that $\theta^{*}\in\Theta$ .

In what follows, we work up to label permutation on the groups. Indeed, as in any latent group model, the parameters can only be recovered up to label switching on the latent groups. We then define the following notation for any permutation $\sigma\in\mathfrak{S}_{Q}$ with $\mathfrak{S}_{Q}$ the set of permutations on $\llbracket 1,Q\rrbracket$

[TABLE]

2.3 Finite time case

If the number of time steps $T$ is fixed, it is possible to let the connection probabilities vary over time. We then consider this case, the connection parameter now being $\pi^{1:T}=(\pi^{1},\ldots,\pi^{T})$ with $\pi^{t}=(\pi^{t}_{ql})_{1\leq q,l\leq Q}$ for every $t\in\llbracket 1,T\rrbracket$ and $\pi_{ql}^{t}=\mathbb{P}_{\theta}(X^{t}_{ij}=1\>|\>Z^{t}_{i}=q,Z^{t}_{j}=l)$ for any $(t,q,l)\in\llbracket 1,T\rrbracket\times\llbracket 1,Q\rrbracket^{2}$ . Note that this is the more general model of Matias and Miele (2017), in which the model parameter is $\theta=(\Gamma,\pi^{1:T})$ . Moreover, we introduce the following Assumptions 1’ and 3’ that are alternate versions of Assumptions 1 and 3 respectively for the finite time case.

1’

. For every $t\in\llbracket 1,T\rrbracket$ , for every $1\leq q\neq q^{\prime}\leq Q$ , there exists some $l\in\llbracket 1,Q\rrbracket$ such that $\pi^{t}_{ql}\neq\pi^{t}_{q^{\prime}l}$ . 2. 3’

. There exists some $\zeta>0$ such that for every $t\in\llbracket 1,T\rrbracket$ , for any $(q,l)\in\llbracket 1,Q\rrbracket^{2}$ , we have $\pi^{t}_{ql}\in[\zeta,1-\zeta]$ .

Assumption 1’ (resp. Assumption 3’) expresses that for every $t\in\llbracket 1,T\rrbracket$ , $\pi^{t}$ satisfies Assumption 1 (resp. Assumption 3). We also introduce the following additional assumption, which ensures (together with Assumption 1’) that the model is identifiable (up to a label permutation). See Matias and Miele (2017).

For every $q\in\llbracket 1,Q\rrbracket$ , for every $t_{1},t_{2}\in\llbracket 1,T\rrbracket$ , $\pi^{t_{1}}_{qq}=\pi^{t_{2}}_{qq}\coloneqq\pi_{qq}$ and $\{\pi_{qq};q\in\llbracket 1,Q\rrbracket\}$ are $Q$ distinct values.

Assumption 4 states that the diagonal of $\pi$ does not change over time, and that its values are distinct. We denote by $\Theta^{T}$ the set of parameters satisfying Assumptions 1’, 2, 3’ and 4. As before, we assume in the following that $\theta^{*}\in\Theta^{T}$ in the fixed $T$ case. We also define as before for any $\pi^{1:T}$ and $\pi^{\prime 1:T}$ the distance

[TABLE]

2.4 Likelihood

The conditional log-likelihood and the log-likelihood write

[TABLE]

respectively. We then denote the maximum likelihood estimator (MLE) by

[TABLE]

In the next section, we study separately the consistency of the connectivity parameter estimator $\hat{\pi}$ and that of the transition matrix estimator $\hat{\Gamma}$ .

3 Consistency of the maximum likelihood estimate

3.1 Connectivity parameter

We first prove the consistency of the maximum likelihood estimator of the connectivity parameter $\pi=(\pi_{ql})_{1\leq q,l\leq Q}$ when the number of nodes and time steps increase. We denote the normalized log-likelihood by

[TABLE]

and introduce the quantities, for any $A=(a_{ql})_{1\leq q,l\leq Q}\in\mathcal{A}$ the set of $Q\times Q$ stochastic matrices,

[TABLE]

where $\bar{A}_{\pi}=\operatorname*{argmax}_{A\in\mathcal{A}}\mathbb{M}(\pi,A)$ . It is worth noticing that $\mathbb{M}(\pi)$ , which will be the limiting value for $M_{n,T}(\Gamma,\pi)$ when $n$ and $T$ increase (see below), does not depend on $\Gamma$ .

Theorem 1.

For any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity, if $\log(T)=o(n)$ , we have for all $\epsilon>0$

[TABLE]

We then conclude on the consistency of the maximum likelihood estimator of the connection probabilities with the following corollary. Note that we also obtain an upper bound of the rate of convergence of this estimator.

Corollary 1.

For any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity such that $r_{n,T}=o(n^{1/4})$ and if $\log(T)=o(n)$ , we have for every $\epsilon>0$

[TABLE]

We want to get equivalent consistency results if the number of time steps $T$ is fixed and only the number of nodes $n$ increases. In that case, denoting by $\hat{\theta}=(\hat{\Gamma},\hat{\pi}^{1:T})$ the MLE of $\theta$ , we have the following Corollary that is the equivalent of Corollary 1.

Corollary 2.

If the number of time steps $T$ is fixed, we have for every $\epsilon>0$ and for any sequence $\{r_{n}\}_{n\geq 1}$ increasing to infinity such that $r_{n}=o(n^{1/4})$

[TABLE]

denoting $\hat{\pi}^{1:T}_{\sigma}=(\hat{\pi}_{\sigma}^{t})_{t\in\llbracket 1,T\rrbracket}$ .

This result states that $\min_{\sigma\in\mathfrak{S}_{Q}}\|\pi^{*1:T}-\hat{\pi}_{\sigma}^{1:T}\|_{\infty}$ converges to [math] in $\mathbb{P}_{\theta^{*}}$ -probability as $n$ increases, i.e. the MLE of the connection probabilities is consistent up to label switching, and gives an upper bound of the rate of convergence of the MLE of the connection probabilities. The particular case when $T=1$ is then a stronger result than that of Celisse et al. (2012) where no rate of convergence is given.

Remark 1.

Note that in Corollaries 1 and 2, the results still hold for any sequences $r_{n,T}$ and $r_{n}$ increasing to infinity, respectively. However, we are interested in sequences increasing slowly to infinity, giving the strongest results, namely the smallest lower bounds. Indeed, whenever these assumptions are not satisfied, the lower bounds appearing in the inequalities are larger, and the results may even become trivial.

3.2 Latent transition matrix

We now prove that the MLE for the transition matrix $\Gamma$ is consistent when the number of nodes and time steps increase.

Lemma 1.

Any critical point $\breve{\theta}=(\breve{\Gamma},\breve{\pi})$ of the likelihood function $\ell(\cdot)$ is such that $\breve{\Gamma}$ satisfies the fixed point equation

[TABLE]

There are two different possible cases for the MLE $\hat{\theta}$

•

Either $\hat{\theta}$ is a critical point of the likelihood function. Then $\hat{\Gamma}$ satisfies equation (4).

•

Or $\hat{\theta}$ is not a critical point (this can happen if it belongs to the boundary of $\Theta$ ) and we assume that there exists $\breve{\Gamma}$ such that $(\breve{\Gamma},\hat{\pi})\in\Theta$ and $(\breve{\Gamma},\hat{\pi})$ satisfies equation (4) (at least for $n$ and $T$ large enough). We then choose as our estimator $(\breve{\Gamma},\hat{\pi})$ . By an abuse of notation, we will denote this estimator $\hat{\theta}=(\hat{\Gamma},\hat{\pi})$ and call it MLE in the following.

In what follows, for any fixed configuration $z^{1:T}$ , any $\theta\in\Theta$ and any $\epsilon>0$ , we consider the event

[TABLE]

The following result establishes that asymptotically, any estimator that correctly estimates the transition probability matrix $\pi$ also recovers the group memberships. This result is similar to Theorem 1 in Mariadassou and Matias (2015).

Theorem 2.

For any estimator $\breve{\theta}\in\Theta$ (at least for $n$ and $T$ large enough), if $\log(T)=o(n)$ , there exist some positive constants $C,C_{1},C_{2},C_{3},C_{4}$ such that for any $\epsilon>0$ , for any positive sequence $\{y_{n,T}\}_{n,T\geq 1}$ such that $\log(1/y_{n,T})=o(n)$ , any $\eta\in(0,\delta)$ and for $n$ and $T$ large enough, we have

[TABLE]

whenever $\{v_{n,T}\}_{n,T\geq 1}$ is a sequence decreasing to [math] such that $v_{n,T}=o(\sqrt{\log(nT)}/n)$ .

Theorem 3.

If $\log(T)=o(n)$ , for any $\epsilon>0$ and $\{r_{n,T}\}_{n,T\geq 1}$ any sequence increasing to infinity such that $r_{n,T}=o\left(\sqrt{nT/\log n}\right)$ , we have for any $\sigma\in\mathfrak{S}_{Q}$

[TABLE]

with $\{v_{n,T}\}_{n,T\geq 1}$ a sequence decreasing to [math] such that $v_{n,T}=o(\sqrt{\log(nT)}/n)$ .

Corollary 3.

Assume that $\log(T)=o(n)$ and $\min_{\sigma\in\mathfrak{S}_{Q}}\|\hat{\pi}_{\sigma}-\pi^{*}\|_{\infty}=o_{\mathbb{P}_{\theta^{*}}}(v_{n,T})$ with $\{v_{n,T}\}_{n,T\geq 1}$ a sequence decreasing to [math] such that $v_{n,T}=o(\sqrt{\log(nT)}/n)$ . Then for any $\epsilon>0$ and $\{r_{n,T}\}_{n,T\geq 1}$ any sequence increasing to infinity such that $r_{n,T}=o\left(\sqrt{nT/\log n}\right)$ , we have the convergence

[TABLE]

Remark 2.

Note that the upper bound obtained in Corollary 1 on the rate of convergence in probability of $\hat{\pi}$ does not ensure that $\min_{\sigma\in\mathfrak{S}_{Q}}\|\hat{\pi}_{\sigma}-\pi^{*}\|_{\infty}=o_{\mathbb{P}_{\theta^{*}}}(v_{n,T})$ holds. While the latter has never been established (to our knowledge), it is a reasonable assumption.

We want an equivalent result than that of Corollary 3 when the number of time steps $T$ is fixed, and the connection probabilities are varying over time (the connection parameter being $\pi=\pi^{1:T}=(\pi^{1},\ldots,\pi^{T})$ with $\pi^{t}=(\pi^{t}_{ql})_{q,l}$ ). For that, we are going to need an equivalent of Theorem 2 in that case.

Theorem 4.

For any fixed $T\geq 2$ , for any estimator $\breve{\theta}\in\Theta^{T}$ (at least for $n$ large enough), there exist some positive constants $C,C_{1},C_{2},C_{3},C_{4}$ such that for any $\epsilon>0$ , for any positive sequence $\{y_{n}\}_{n\geq 1}$ such that $\log(1/y_{n})=o(n)$ , any $\eta\in(0,\delta)$ and for $n$ large enough, we have

[TABLE]

whenever $\{v_{n}\}_{n\geq 1}$ is a sequence decreasing to [math] such that $v_{n}=o(\sqrt{\log(n)}/n)$ .

The following corollary gives the expected result.

Corollary 4.

Let the number of time steps $T\geq 2$ be fixed. Assume that $\min_{\sigma\in\mathfrak{S}_{Q}}\|\hat{\pi}^{1:T}_{\sigma}-\pi^{*1:T}\|_{\infty}=o_{\mathbb{P}_{\theta^{*}}}(v_{n})$ with $\{v_{n}\}_{n\geq 1}$ a sequence decreasing to [math] such that $v_{n}=o(\sqrt{\log(n)}/n)$ . Then for any $\epsilon>0$ and $\{r_{n}\}_{n\geq 1}$ any sequence increasing to infinity such that $r_{n}=o\left(\sqrt{n/\log n}\right)$ , we have the convergence

[TABLE]

The proof of Corollary 4 is the same as that of Corollary 3, but relying on Theorem 4 instead of Theorem 2 and is therefore omitted.

Remark 3.

As in Remark 1 for Corollaries 1 and 2, the results of Corollaries 3 and 4 still hold for sequences $r_{n,T}$ and $r_{n}$ increasing to infinity at any rate.

4 Variational estimators

In practice, we cannot compute the MLE except for very small values of $n$ and $T$ , because it involves a summation over all the $Q^{nT}$ possible latent configurations. We cannot either use the Expectation-Maximization (EM) algorithm to approximate it because it involves the computation of the conditional distribution of the latent variables given the observations which is not tractable. A common solution is to use the Variational Expectation-Maximization (VEM) algorithm that optimizes a lower bound of the log-likelihood (see for example Daudin et al. (2008)). Let us denote $Z^{t}_{iq}=\mathds{1}_{Z^{t}_{i}=q}$ for every $t,i$ and $q$ . Using the same approach as in Matias and Miele (2017) for the VEM algorithm in the dynamic SBM, we consider a variational approximation of the conditional distribution of the latent variable $Z^{1:T}$ given the observed variable $X^{1:T}$ in the class of probability distributions parameterized by $\chi=(\tau,\eta)=\left(\{\tau^{t}_{iq}\}_{t,i,q},\{\eta^{t}_{iql}\}_{t,i,q,l}\right)$ of the form

[TABLE]

i.e. with $\mathbb{Q}_{\chi}$ such that $\mathbb{E}_{\mathbb{Q}_{\chi}}\left[Z^{t}_{iq}Z^{t+1}_{il}\right]=\eta^{t}_{iql}$ and $\mathbb{E}_{\mathbb{Q}_{\chi}}\left[Z^{t}_{iq}\right]=\tau^{t}_{iq}$ . Notice that $\mathbb{Q}_{\chi}(Z^{t+1}_{i}=l\>|\>Z^{t}_{i}=q)=\eta_{iql}^{t}/\tau_{iq}^{t}=\eta_{iql}^{t}/\sum_{q^{\prime}=1}^{Q}\eta^{t}_{iqq^{\prime}}$ . The quantity to optimize in the VEM algorithm is then

[TABLE]

with $KL(\cdot,\cdot)$ denoting the Kullback-Leibler divergence and $\mathcal{H}(\cdot)$ denoting the entropy. Define

[TABLE]

and the variational estimator of $\theta$

[TABLE]

Moreover, we denote $\tilde{\chi}=(\tilde{\tau},\tilde{\eta})=\hat{\chi}(\tilde{\theta})=(\hat{\tau}(\tilde{\theta}),\hat{\eta}(\tilde{\theta}))$ . In practice, the VEM algorithm is an iterative algorithm that maximizes the function $\mathcal{J}$ alternatively with respect to $\chi$ and $\theta$ in order to find $\tilde{\theta}$ .

4.1 Connectivity parameter

Theorem 5.

For any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity, if $\log(T)=o(n)$ , we have for all $\epsilon>0$

[TABLE]

We conclude on the consistency of the connection probabilities variational estimators as $n$ and $T$ increase thanks to the following corollary.

Corollary 5.

For any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity such that $r_{n,T}=o(n^{1/4})$ , we have for any $\epsilon>0$

[TABLE]

We have the equivalent following corollary for a fixed number of time steps.

Corollary 6.

If the number of time steps $T$ is fixed, we have for every $\epsilon>0$ and for any sequence $\{r_{n}\}_{n\geq 1}$ increasing to infinity such that $r_{n}=o(n^{1/4})$

[TABLE]

Remark 4.

As for Corollaries 1 to 4, the results of Corollaries 5 and 6 still hold for any sequences $r_{n,T}$ and $r_{n}$ increasing to infinity.

4.2 Latent transition matrix

We now prove that $\tilde{\Gamma}$ is consistent when the number of nodes and time steps increase.

Lemma 2.

Any critical point $(\breve{\chi},\breve{\theta})$ of the function $\mathcal{J}(\cdot,\cdot)$ is such that $\breve{\Gamma}$ satisfies the fixed-point equation

[TABLE]

We assume that $(\tilde{\chi},\tilde{\theta})$ is a critical point of $\mathcal{J}(\cdot,\cdot)$ . Then we have the fixed-point equation

[TABLE]

The following proposition gives the consistency and a rate of convergence of this estimator, under an assumption on the rate of convergence of $\tilde{\pi}$ .

Theorem 6.

If $\log(T)=o(n)$ , for any $\epsilon>0$ and $\{r_{n,T}\}_{n,T\geq 1}$ any sequence increasing to infinity such that $r_{n,T}=o\left(\sqrt{nT/\log n}\right)$ and for any $\sigma\in\mathfrak{S}_{Q}$

[TABLE]

with $\{v_{n,T}\}_{n,T\geq 1}$ a sequence decreasing to [math] such that $v_{n,T}=o(\sqrt{\log(nT)}/n)$ .

Corollary 7.

Assume that $\log(T)=o(n)$ and $\min_{\sigma\in\mathfrak{S}_{Q}}\|\tilde{\pi}_{\sigma}-\pi^{*}\|_{\infty}=o_{\mathbb{P}_{\theta^{*}}}(v_{n,T})$ with $\{v_{n,T}\}_{n,T\geq 1}$ a sequence decreasing to [math] such that $v_{n,T}=o(\sqrt{\log(nT)}/n)$ . Then for any $\epsilon>0$ and $\{r_{n,T}\}_{n,T\geq 1}$ any sequence increasing to infinity such that $r_{n,T}=o\left(\sqrt{nT/\log n}\right)$ , we have the convergence

[TABLE]

The proof of Corollary 7 is the same as that of Corollary 3, using Theorem 6 instead of Theorem 3 and is therefore omitted.

When the number of time steps $T$ is fixed and the connection probabilities can vary over time, we have the following Corollary that is the equivalent of Corollary 7.

Corollary 8.

Let the number of time steps $T\geq 2$ be fixed. Assume that $\min_{\sigma\in\mathfrak{S}_{Q}}\|\tilde{\pi}^{1:T}_{\sigma}-\pi^{*1:T}\|_{\infty}=o_{\mathbb{P}_{\theta^{*}}}(v_{n})$ with $\{v_{n}\}_{n\geq 1}$ a sequence decreasing to [math] such that $v_{n}=o(\sqrt{\log(n)}/n)$ . Then for any $\epsilon>0$ and $\{r_{n}\}_{n\geq 1}$ any sequence increasing to infinity such that $r_{n}=o\left(\sqrt{n/\log n}\right)$ , we have the convergence

[TABLE]

The proof of Corollary 8 is the same as that of Corollary 7, but relying on Theorem 4 instead of Theorem 2 and is therefore omitted.

Remark 5.

As for Corollaries 1 to 6, the results of Corollaries 7 and 8 still hold for any sequences $r_{n,T}$ and $r_{n}$ increasing to infinity.

5 Proofs of main results

5.1 Proof of Theorem 1

The proof follows the lines of the proof of Theorem 3.6 in Celisse et al. (2012). Nonetheless, our result is sharper as we establish an upper bound of the rate of convergence (in probability) of the normalised likelihood. We fix some $\theta\in\Theta$ and introduce the quantities

[TABLE]

Note that $\tilde{Z}^{1:T}$ is a random variable that depends on $Z^{1:T}$ and that

[TABLE]

Similarly, for any $t\in\llbracket 1,T\rrbracket$ , we have $\tilde{Z}^{t}=\operatorname*{argmax}_{z\in\llbracket 1,Q\rrbracket^{n}}\mathbb{E}_{\theta^{*}}\left[\log\mathbb{P}_{\theta}(X^{t}\>|\>Z^{t}=z)\>|\>Z^{t}\right]$ .

We bound the difference between $M_{n,T}(\Gamma,\pi)$ and $\mathbb{M}(\pi)$ by introducing three intermediate terms so that we can write, for any sequence $\{r_{n,T}\}_{n,T\geq 1}$ and any $\epsilon>0$

[TABLE]

In the following, we prove separately the convergence (in $\mathbb{P}_{\theta^{*}}$ -probability) to zero of the three terms of this sum (while controlling for the rate of these convergences). Before starting, let us remark that we have

[TABLE]

In particular, for every $t\in\llbracket 1,T\rrbracket$ , we have

[TABLE]

First term of the right-hand side of (10).

We let

[TABLE]

Lemma 3.

For every $t\in\llbracket 1,T\rrbracket$ , we have

[TABLE]

Going back to (13) and applying Lemma 3, we get

[TABLE]

Now, using classical dependency rules in directed acyclic graphs (see for e.g. Lauritzen, 1996) combined with Assumption 2, we get

[TABLE]

This implies that $\mathbb{P}_{\theta^{*}}(\sup_{\theta\in\Theta}T_{1}>\epsilon r_{n,T}/(3\sqrt{n}))=0$ as soon as $\epsilon r_{n,T}/\sqrt{n}\geq 6\log(1/\delta)/(n-1)$ . Then for any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity, for any $\epsilon>0$ , we have that $\mathbb{P}_{\theta^{*}}(\sup_{\theta\in\Theta}T_{1}>\epsilon r_{n,T}/(3\sqrt{n}))\to 0$ as $n$ and $T$ increase.

Second term of the right-hand side of (10).

Let us denote

[TABLE]

For the sake of clarity, we study this term on the event $\{Z^{1:T}=z^{*1:T}\}$ where $z^{*1:T}\in\llbracket 1,Q\rrbracket^{nT}$ is a fixed configuration. This event induces the definition of $\tilde{Z}^{1:T}$ following Equation (8) as

[TABLE]

or equivalently for every $t\in\llbracket 1,T\rrbracket$ ,

[TABLE]

By definition of $\hat{z}^{1:T}$ and $\tilde{Z}^{1:T}$ respectively, we have the two inequalities

[TABLE]

and

[TABLE]

implying the lower and upper bounds

[TABLE]

Taking the absolute value gives us an upper bound for $T_{2}(z^{*1:T})$

[TABLE]

Using Equations (11) and (12), we then obtain the following upper bound for $T_{2}(z^{*1:T})$

[TABLE]

We use the following concentration result to conclude.

Lemma 4.

Let $\epsilon,\beta>0$ and $\{x_{n,T}\}_{n,T\geq 1}$ a sequence of positive real numbers. We let $\mathbb{P}^{*}_{\theta^{*}}(\cdot)$ denote the probability conditional on $\{Z^{1:T}=z^{*1:T}\}$ under parameter $\theta^{*}$ , i.e. $\mathbb{P}^{*}_{\theta^{*}}(\cdot)=\mathbb{P}_{\theta^{*}}(\cdot\>|\>Z^{1:T}=z^{*1:T})$ . Denoting $\Lambda=2\log[(1-\zeta)/\zeta]>0$ we have for any $\theta\in\Theta$

[TABLE]

with $\Omega=(1+\beta)\Lambda\sqrt{n(n-1)T/2}+\Lambda\sqrt{n(n-1)Tx_{n,T}/4}+(1/\beta+1/3)(\Lambda/2)x_{n,T}$ .

Let us choose $x_{n,T}=\log(n)$ in the above lemma. For any $\epsilon>0$ , for any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity, we have for $n$ and $T$ large enough

[TABLE]

Then for $n$ and $T$ large enough, the first term in the right-hand side of inequality (4) is equal to [math] and we have

[TABLE]

Third term of the right-hand side of (10).

Let us denote

[TABLE]

For any fixed configuration $z^{t}\in\llbracket 1,Q\rrbracket^{n}$ , analogous to Equation (12), we write

[TABLE]

where $C_{qq^{\prime}}(Z^{t},z^{t})=|\{i\in\llbracket 1,n\rrbracket;Z^{t}_{i}=q,z_{i}^{t}=q^{\prime}\}|$ is the (random variable) number of nodes classified in group $q$ in the current (random) configuration $Z^{t}$ , while they belong to group $q^{\prime}$ in (deterministic) configuration $z^{t}$ . Recall that $N_{q}(z^{t})$ is the number of nodes assigned to class $q$ by the configuration $z^{t}$ and let us denote $a^{t}_{qq^{\prime}}=a_{qq^{\prime}}(Z^{t},z^{t})=C_{qq^{\prime}}(Z^{t},z^{t})/N_{q}(Z^{t})$ the (random) proportion of vertices from class $q$ in $Z^{t}$ attributed to class $q^{\prime}$ by $z^{t}$ . We write

[TABLE]

with $A^{t}=(a^{t}_{qq^{\prime}})_{1\leq q,q^{\prime}\leq Q}$ .

Now extending these notations to the case where $z^{t}=\tilde{Z}^{t}$ , we let $\tilde{A}^{t}=(\tilde{a}^{t}_{qq^{\prime}})_{1\leq q,q^{\prime}\leq Q}$ where $\tilde{a}^{t}_{qq^{\prime}}=a_{qq^{\prime}}(Z^{t},\tilde{Z}^{t})$ . We remark that the definition of $\tilde{Z}^{t}$ implies that $\tilde{A}^{t}=\operatorname*{argmax}_{A^{t}\in\mathcal{A}^{t}(Z^{1:T})}\Phi^{t}(A^{t},\pi)$ with $\mathcal{A}^{t}(Z^{1:T})$ the (random) subset of stochastic matrices defined for every $t\in\llbracket 1,T\rrbracket$ by

[TABLE]

Let us also denote $\bar{A}_{\pi}^{t}=\operatorname*{argmax}_{A\in\mathcal{A}^{t}(Z^{1:T})}\mathbb{M}(\pi,A)$ . Then

[TABLE]

We start by stating a concentration lemma on the random variable $N_{q}(Z^{t})$ for any $q\in\llbracket 1,Q\rrbracket$ and any $t\in\llbracket 1,T\rrbracket$ .

Lemma 5.

For any $\theta\in\Theta$ and any $\eta\in(0,\delta)$ , let

[TABLE]

Then $\mathbb{P}_{\theta}\left(Z^{1:T}\in\Omega_{\eta}(\theta)\right)\geq 1-QT\exp(-2\eta^{2}n)$ .

Building on the previous concentration lemma, the following one gives the convergence in $\mathbb{P}_{\theta^{*}}$ -probability of the second term in the right-hand side of (15).

Lemma 6.

For any $\epsilon>0$ , any $\eta\in(0,\delta)$ and $\{r_{n,T}\}_{n,T\geq 1}$ any positive sequence,

[TABLE]

with $c=6(1-\delta)^{2}(1-\zeta)\log(1/\zeta)Q^{4}$ .

Then taking any $\eta\in(0,\delta)$ , for any $\epsilon>0$ , for any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity, we have the following inequality for $n$ and $T$ large enough

[TABLE]

implying that the probability in Lemma 6 converges to [math] as $n$ and $T$ increase for any $\epsilon>0$ , as long as $\log T=o(n)$ . Now, for the first term in the right-hand side of (15), note that we have for every $\pi$ and every $t$

[TABLE]

Then, either $\mathbb{M}(\pi,\bar{A}_{\pi}^{t})\leq\Phi^{t}(\tilde{A}^{t},\pi)$ and

[TABLE]

or $\mathbb{M}(\pi,\bar{A}_{\pi}^{t})\geq\Phi^{t}(\tilde{A}^{t},\pi)$ and

[TABLE]

In both cases, we get that $\left|\Phi^{t}(\tilde{A}^{t},\pi)-\mathbb{M}(\pi,\bar{A}_{\pi}^{t})\right|\leq\sup_{A\in\mathcal{A}}\left|\Phi^{t}(A,\pi)-\mathbb{M}(\pi,A)\right|$ for every $t$ and $\pi$ , thus obtaining the upper bound

[TABLE]

Letting

[TABLE]

and recalling that $0\leq a_{ql}\leq 1$ (for every $q,l\in\llbracket 1,Q\rrbracket$ ) for every $A=(a_{ql})_{1\leq q,l\leq Q}\in\mathcal{A}$ , we have

[TABLE]

Finally, we bound the first term of the right-hand-side of (15) as follows

[TABLE]

Applying Markov’s Inequality, we obtain

[TABLE]

The following lemma gives an upper bound of the expectation appearing in the previous inequality, for any $q,l\in\llbracket 1,Q\rrbracket$ .

Lemma 7.

For any $q,l\in\llbracket 1,Q\rrbracket$ and any $t\in\llbracket 1,T\rrbracket$ , we have the following inequality

[TABLE]

This leads to

[TABLE]

Then for any $\epsilon>0$ , for any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity, we have the convergence

[TABLE]

We proved the convergence to [math] of the three terms in the right-hand side of (10) for any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity and as long as $\log T=o(n)$ . This gives the expected result and concludes the proof. ∎

5.2 Proof of Corollary 1

To prove this corollary, we establish the following lemma that allows us to obtain a rate of convergence of $\hat{\pi}$ to $\pi^{*}$ from a rate of convergence of $M_{n,T}$ to $\mathbb{M}$ . Note that this lemma is a bit more general than what we need and gives an equivalent result when the number of time steps $T$ is fixed, which will be useful for Corollary 2.

Lemma 8.

Let $\{F_{n,T}\}_{n,T\geq 1}$ be any random functions on the set $\Theta$ (resp. $\Theta^{T}$ ) and $\mathbb{M}$ (resp. $\mathbb{M}^{T}$ ) defined as before. Assume that there exists a sequence $\{v_{n,T}\}_{n,T\geq 1}$ (resp. $\{v_{n}\}_{n\geq 1}$ ) a sequence decreasing to [math] such that for every $\epsilon>0$ , we have the following convergence as $n,T\rightarrow\infty$ (resp. $n\rightarrow\infty$ )

[TABLE]

If for any $n$ and $T$ , $\hat{\theta}=(\hat{\Gamma},\hat{\pi})$ (resp. $\hat{\theta}=(\hat{\Gamma},\hat{\pi}^{1:T})$ ) is defined as the maximizer of $F_{n,T}$ on the set $\Theta$ , (resp. $\Theta^{T}$ ) we have the following convergence

[TABLE]

with $\hat{\pi}_{\sigma^{1:T}}^{1:T}=(\hat{\pi}_{\sigma^{t}}^{t})_{t\in\llbracket 1,T\rrbracket}$ .

The result of Corollary 1 is then a direct consequence of Theorem 1 (choosing the sequence $\{r_{n,T}^{2}\}_{n,t\geq 1}$ ) and Lemma 8 applied with $F_{n,T}=M_{n,T}$ . ∎

5.3 Proof of Theorem 2

The proof follows the lines of the proof of Theorem 3.8 in Celisse et al. (2012). Nonetheless, our result is sharper as we will establish an upper bound of the rate of convergence (in probability) of the quantity at stake. For any $\epsilon>0$ , any sequence $\{y_{n,T}\}_{n,T\geq 1}$ and $\eta\in(0,\delta)$ , we write

[TABLE]

with $\Omega_{\eta}(\theta^{*})$ as defined in Lemma 5. We will establish that there exist some positive constants $C,C_{1},C_{2},C_{3},C_{4}$ such that for any fixed configuration $z^{*1:T}\in\Omega_{\eta}(\theta^{*})$ , any $\epsilon>0$ , any positive sequence $\{y_{n,T}\}_{n,T\geq 1}$ such that $\log(1/y_{n,T})=o(n)$ and $n$ and $T$ large enough, we have

[TABLE]

Combined with (5.3) and applying Lemma 5, this gives the desired result. So now we focus on establishing (20).

In what follows, we consider a fixed configuration $z^{*1:T}\in\Omega_{\eta}(\theta^{*})$ and introduce the Hamming distance between $z^{*1:T}$ and any other configuration $z^{1:T}$ defined as

[TABLE]

We let $\mathbb{P}^{*}_{\theta^{*}}(\cdot)$ denote the probability conditional on $\{Z^{1:T}=z^{*1:T}\}$ under parameter $\theta=\theta^{*}$ , i.e. $\mathbb{P}^{*}_{\theta^{*}}(\cdot)=\mathbb{P}_{\theta^{*}}(\cdot\>|\>Z^{1:T}=z^{*1:T})$ . In the following, we will often use the fact that the variables $\{X_{ij}^{t}\}$ are independent under $\mathbb{P}^{*}_{\theta^{*}}$ (with mean value $\pi^{*}_{z_{i}^{*t}z_{j}^{*t}}$ ) so that we can rely on Hoeffding’s Inequality. We introduce a sequence $\{v_{n,T}\}_{n,T\geq 1}$ decreasing to 0 and $\Omega_{n,T}$ the event defined as

[TABLE]

We bound the probability of interest in (20) by splitting it on the two complementary events $\Omega_{n,T}$ and $\Omega_{n,T}^{c}$ . For any $\epsilon>0$ and any positive sequence $\{y_{n,T}\}_{n,T\geq 1}$

[TABLE]

Thus, the proof of (20) boils down to establishing the desired upper bound on the second term appearing in the right-hand side of (21). We have

[TABLE]

by using the bound $(Q-1)^{r}\binom{nT}{r}\leq Q^{r}(nT)^{r}$ on the number of terms in the sum over $\{z^{1:T};\|z^{1:T}-z^{*1:T}\|_{0}=r\}$ (for each value of $r$ ). Then,

[TABLE]

as long as $nT\geq Q$ . For any configuration $z^{1:T}$ such that $\|z^{1:T}-z^{*1:T}\|_{0}=r$ , we denote by $r(1),\ldots,r(T)$ the number of differences between the two configurations at each time step $t\in\llbracket 1,T\rrbracket$ , i.e. $r(t)=\|z^{t}-z^{*t}\|_{0}$ such that $r=\sum_{t}r(t)$ . Moreover, for any parameter $\pi$ , we define $D_{n,T}(z^{1:T},\pi)$ the subset of indexes $(i,j,t)\in\llbracket 1,n\rrbracket^{2}\times\llbracket 1,T\rrbracket$ such that $i<j$ for which the parameter $\pi$ differs between the configuration $z^{*1:T}$ and $z^{1:T}$ , namely

[TABLE]

with $I_{n,T}=\{(i,j,t)\in\llbracket 1,n\rrbracket^{2}\times\llbracket 1,T\rrbracket;i<j\}$ the set of indexes over which we sum to compute the conditional log-likelihood. In what follows, we abbreviate to $D^{*}$ (resp. $\breve{D}$ ), the set $D_{n,T}(z^{1:T},\pi^{*})$ (resp. $D_{n,T}(z^{1:T},\breve{\pi})$ ). Next lemma gives a decomposition of the main term at stake in (22).

Lemma 9.

We have the decomposition

[TABLE]

where

[TABLE]

Combining (22) and Lemma 9, we obtain

[TABLE]

We then decompose

[TABLE]

We handle these three terms separately in the following. From now on, we consider a configuration $z^{1:T}$ such that $\|z^{1:T}-z^{*1:T}\|_{0}=r=\sum_{t}r(t)$ .

First term in the right-hand side of (5.3).

Recall that $U_{1}$ is given by (23). We can further decompose this term

[TABLE]

For $n$ and $T$ large enough such that $\breve{\Gamma}\in[\delta,1-\delta]^{Q^{2}}$ (implying for the corresponding stationary distribution $\breve{\alpha}\in[\delta,1-\delta]^{Q}$ ), we have

[TABLE]

To handle the term $U_{1}$ , we need to lower bound the cardinality of the set $D^{*}$ . This is the purpose of Lemma 10 which is a generalization of Proposition B.4 in Celisse et al. (2012). This can be done for all the configurations $z^{1:T}$ and all the configurations $z^{*1:T}$ that belong to some $\Omega_{\eta}(\theta)$ .

Lemma 10.

For any $\eta\in(0,\delta)$ , any parameter $\theta\in\Theta$ , any configuration $z^{1:T}$ and any $z^{*1:T}\in\Omega_{\eta}(\theta)$ such that $\|z^{1:T}-z^{*1:T}\|_{0}=r$ , we have

[TABLE]

Combining Lemma 10 with the previous bound, we get that

[TABLE]

We also have

[TABLE]

with $k(x,y)=x\log(x/y)+(1-x)\log[(1-x)/(1-y)]$ for $(x,y)\in(0,1)^{2}$ . The function $k$ is positive for every $(x,y)$ such that $x\neq y$ , hence, introducing the notation $K^{*}=\min_{q,l,q^{\prime},l^{\prime};\pi^{*}_{ql}\neq\pi^{*}_{q^{\prime}l^{\prime}}}k(\pi^{*}_{ql},\pi^{*}_{q^{\prime}l^{\prime}})/2$ ,

[TABLE]

So, by (28), we have for $n$ large enough

[TABLE]

This leads to

[TABLE]

for any $u>0$ and large enough $n$ . Moreover, thanks to Hoeffding’s Inequality and Assumption 3,

[TABLE]

where $C_{\zeta}$ is a constant depending on $\zeta$ . Finally using Lemma 10, we have

[TABLE]

Second term in the right-hand side of (5.3).

We have

[TABLE]

For any $q,l,q^{\prime},l^{\prime}\in\llbracket 1,Q\rrbracket$ , we introduce the sets

[TABLE]

Then we bound

[TABLE]

For every $u>0$ , we thus have

[TABLE]

We start by dealing with the first term of the right-hand side of (5.3). Notice that on the event $\Omega_{n,T}$ , we have $\left|(\breve{\pi}_{ql}-\pi^{*}_{ql})/(\pi^{*}_{ql}(1-\pi^{*}_{ql}))\right|\leq v_{n,T}/\zeta^{2}$ for every $q,l\in\llbracket 1,Q\rrbracket$ . The next lemma establishes that any set $D_{n,T}(z^{1:T},\pi)$ is included in a larger set, whose cardinality is bounded. In particular, the random set $\breve{D}$ is included in a larger deterministic subset.

Lemma 11.

Let $z^{1:T}$ and $z^{*1:T}$ denote two configurations such that $\|z^{1:T}-z^{*1:T}\|_{0}=r$ . Then for any parameter $\pi=(\pi_{ql})_{1\leq q,l\leq Q}$ , we have

[TABLE]

As the set $G_{ql}$ is random (because $\breve{D}$ is random), we write

[TABLE]

where now $D$ is a deterministic set. By a union bound and Hoeffding’s inequality, we have for any $D\subset D_{n,T}(z^{1:T})$

[TABLE]

This leads to

[TABLE]

For the second term of (5.3), we get from a union bound and from Lemma 11 (that gives an upper bound for $|D^{*}\cup\breve{D}|$ ) that

[TABLE]

because $|\pi^{*}_{q^{\prime}l^{\prime}}-\pi^{*}_{ql}|\leq 1$ , implying that

[TABLE]

Finally, we have the following upper bound for the second term of (5.3)

[TABLE]

Third term in the right-hand side of (5.3).

We want to bound (in probability) the last term $U_{3}$ . Distinguishing between the cases where $X_{ij}^{t}=0$ and $X_{ij}^{t}=1$ , we have

[TABLE]

For any $(q,l)\in\llbracket 1,Q\rrbracket^{2}$ , we further introduce the sets

[TABLE]

Centering the $X_{ij}^{t}$ (under the distribution $\mathbb{P}^{*}_{\theta^{*}}$ ), we get

[TABLE]

Then, on the event $\Omega_{n,T}$ and for $n$ and $T$ large enough such that $|(\breve{\pi}_{ql}-\pi^{*}_{ql})/(1-\pi^{*}_{ql})|\leq 1/2$ and $|(\breve{\pi}_{ql}-\pi^{*}_{ql})/\pi^{*}_{ql}|\leq 1/2$ for every $q$ and $l$ , using the fact that $|\log(1+x)|\leq 2|x|$ for $x\in[-1/2,1/2]$ , we have

[TABLE]

Then, for every $u>0$ ,

[TABLE]

For the first term of (31), using Hoeffding’s inequality as before,

[TABLE]

For the second term of (31), we use

[TABLE]

Finally, we have the following upper bound for the third term of (5.3)

[TABLE]

Combining the 3 bounds on the right-hand-side of (5.3).

[TABLE]

Now we choose the sequence $v_{n,T}$ such that $v_{n,T}=o(\sqrt{\log(nT)}/n)$ which is sufficient to imply that the quantities $\mathbb{P}^{*}_{\theta^{*}}\left(v_{n,T}>\zeta^{2}\log(nT)/(4Q^{2}n)\right)$ and $\mathbb{P}^{*}_{\theta^{*}}\left(v_{n,T}>\log(nT)\zeta/(16n)\right)$ vanish as $n$ and $T$ increase. For large enough values of $n$ and $T$ and with $C_{1}$ , $C_{2},C_{3},C_{4}$ and $\kappa$ positive constants only depending on $Q,\zeta$ and $K^{*}$ , we then have

[TABLE]

Let us introduce

[TABLE]

Now we go back to (26). Noticing that the number of configurations $z^{1:T}$ such that $\|z^{1:T}-z^{*1:T}\|_{0}=r$ is equal to $\dbinom{nT}{r}(Q-1)^{r}$ , we have

[TABLE]

Finally, notice that as long as $\log T=o(n)$ and $\log(1/y_{n,T})=o(n)$ (resp. as long as $v_{n,T}=o(\sqrt{\log(nT)}/n)$ ), we have $nTu_{nT}$ (resp. $nTw_{nT}$ ) converges to 0. Then we obtain for some universal positive constant $C$ and large enough $n$ and $T$

[TABLE]

This leads directly to inequality (20). ∎

5.4 Proof of Theorem 3

We fix some $\sigma\in\mathfrak{S}_{Q}$ and study the convergence in $\mathbb{P}_{\theta^{*}}-$ probability of $\hat{\gamma}_{\sigma(q)\sigma(l)}$ to $\gamma^{*}_{ql}$ with $\hat{\Gamma}$ as defined by the fixed point equation (4), i.e.

[TABLE]

First, let us denote

[TABLE]

Then we can write the quantity at stake as

[TABLE]

to obtain the following upper bound on the probability of interest

[TABLE]

First term of the right-hand side of (33).

For the first term in (33), for any $0<\lambda<\delta$ (implying $\lambda<\alpha^{*}_{q}$ for any $q\in\llbracket 1,Q\rrbracket$ ),

[TABLE]

First, we upper bound the probability $\mathbb{P}_{\theta^{*}}\left(\left|A_{q,l}-\alpha^{*}_{q}\gamma^{*}_{ql}\right|>\epsilon r_{n,T}\frac{\sqrt{\log n}}{\sqrt{nT}}\right)$ for any $\epsilon>0$ , using the following lemma.

Lemma 12.

If $\log(T)=o(n)$ , for any $\epsilon>0$ , for any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity such that $r_{n,T}=o\left(\sqrt{nT/\log n}\right)$ and any $\eta\in(0,\delta)$ , we have for any $\sigma\in\mathfrak{S}_{Q}$

[TABLE]

with $v_{n,T}$ a sequence decreasing to [math] such that $v_{n,T}=o\left(\sqrt{\log(nT)}/n\right)$ .

Then, for the second term of (5.4), notice that $B_{q}=\sum_{l=1}^{Q}A_{q,l}$ and $\sum_{l=1}^{Q}\gamma^{*}_{ql}=1$ . We then have, if $\log(T)=o(n)$ and $v_{n,T}=o\left(\sqrt{\log(nT)}/n\right)$ , using Lemma 12 again,

[TABLE]

Finally, for the first term of (33), if $y_{n,T}$ is such that $1/y_{n,T}=o\left(\sqrt{nT/\log(n)}\right)$ , if $v_{n,T}=o\left(\sqrt{\log(nT)}/n\right)$ and as long as $\log(T)=o(n)$ , we obtain

[TABLE]

Second term of the right-hand side of (33).

For the second term of (33), we split it on two complementary events as before. For any $0<\lambda<\delta$ , we have

[TABLE]

We already gave an upper bound on the second term in the right-hand side of (36). Let us give one for the first term. Notice that as $\alpha^{*}_{q}\geq\delta$ and if $B_{q}\geq\alpha^{*}_{q}-\lambda\geq\delta-\lambda>0$ , we have by the mean value theorem

[TABLE]

We can then write for the first term in the right-hand side of (36), as long as $\log(T)=o(n)$ , for $\{y_{n,T}\}_{n,T\geq 1}$ such that $1/y_{n,T}=o\left(\sqrt{nT/\log n}\right)$ and with $v_{n,T}$ such that $v_{n,T}=o\left(\sqrt{\log(nT)}/n\right)$ , still using Lemma 12

[TABLE]

We finally obtain for the second term of the right-hand side of (33)

[TABLE]

We conclude the proof by summing the upper bounds obtained in (35) and (37)

[TABLE]

and by noticing that $\mathbb{P}_{\theta^{*}}(\|\hat{\Gamma}_{\sigma}-\Gamma^{*}\|_{\infty}>\epsilon r_{n,T}\sqrt{\log n}/\sqrt{nT})\leq\sum_{1\leq q,l\leq Q}\mathbb{P}_{\theta^{*}}(|\hat{\gamma}_{\sigma(q)\sigma(l)}-\gamma^{*}_{ql}|>\epsilon r_{n,T}\sqrt{\log n}/\sqrt{nT})$ . ∎

5.5 Proof of Corollary 3

Denoting by $\sigma_{n,T}$ the permutation minimizing the distance between $\hat{\pi}$ (permuted) and $\pi^{*}$ for every $(n,T)\in\llbracket 1,n\rrbracket\times\llbracket 1,T\rrbracket$ , i.e. $\sigma_{n,T}=\operatorname*{argmin}_{\sigma\in\mathfrak{S}_{Q}}\|\hat{\pi}_{\sigma}-\pi^{*}\|_{\infty}$ , we apply Theorem 3 to $\hat{\theta}_{\sigma_{n,T}}$ in order to get

[TABLE]

∎

5.6 Proof of Theorem 5

We use the following lemma, that states that the quantity we optimize in the VEM algorithm and the log-likelihood are asymptotically equivalent.

Lemma 13.

We have the following inequality $\mathbb{P}_{\theta^{*}}$ -a.s.

[TABLE]

We have that for any $\epsilon>0$ , for $n$ and $T$ large enough,

[TABLE]

We then conclude by combining this result with Theorem 1. ∎

5.7 Proof of Corollary 5

This is a direct consequence of Theorem 5 and Lemma 8 applied with the functions $F_{n,T}=\frac{2}{n(n-1)T}\mathcal{J}(\hat{\chi}(\cdot),\cdot)$ . ∎

5.8 Proof of Theorem 6

This proof is quite similar to that of Theorem 3. We fix some $\sigma\in\mathfrak{S}_{Q}$ and study the convergence in $\mathbb{P}_{\theta^{*}}-$ probability of $\tilde{\gamma}_{\sigma(q)\sigma(l)}$ to $\gamma^{*}_{ql}$ with $\tilde{\Gamma}$ as defined by the fixed point equation (5), i.e.

[TABLE]

First, let us denote

[TABLE]

Then we can write the quantity at stake as

[TABLE]

We follow the line of the proof of Theorem 3, using Lemma 14 below instead of Lemma 12 in order to obtain the result.

Lemma 14.

For any $\epsilon>0$ , for any sequence $\{r_{n,T}\}_{n,T\geq 1}$ increasing to infinity such that $r_{n,T}=o\left(\sqrt{nT/\log n}\right)$ and any $\eta\in(0,\delta)$ , we have for any $\sigma\in\mathfrak{S}_{Q}$

[TABLE]

with $v_{n,T}$ a sequence decreasing to [math] such that $v_{n,T}=o(\sqrt{\log(nT)}/n)$ .

∎

Acknowledgement

Work partly supported by the grant ANR-18-CE02-0010 of the French National Research Agency ANR (project EcoNet).

Appendix A Proofs of main results for the finite time case

A.1 Proof of Corollary 2

When the number of time steps is fixed and the connection probabilities vary over time, the conditional log-likelihood is

[TABLE]

and the likelihood $\ell^{T}(\theta)$ is defined as in (2) with $\ell^{T}_{c}(\cdot)$ instead of $\ell_{c}(\cdot)$ . The maximum likelihood estimator is then

[TABLE]

As before, we denote the normalized log-likelihood $M_{n,T}(\Gamma,\pi^{1:T})=2/(n(n-1)T)\ell^{T}(\theta)$ . We introduce the following limiting quantity

[TABLE]

We follow the lines of the proof of Theorem 1 in order to prove that we have for any sequence $y_{n}\to+\infty$ , for all $\epsilon>0$

[TABLE]

Choosing $y_{n}=r_{n}^{2}$ , we then use Lemma 8 to conclude that, as $r_{n}^{2}/\sqrt{n}=o(1)$ by assumption, for any $\epsilon>0$ ,

[TABLE]

In particular, for every $t\in\llbracket 1,T\rrbracket$ , $\hat{\pi}^{t}$ converges in $\mathbb{P}_{\theta^{*}}$ -probability to $\pi^{*t}$ up to label switching. Then, let us prove that on the event $\{\min_{\sigma^{1},\ldots,\sigma^{T}\in\mathfrak{S}_{Q}}\|\hat{\pi}^{1:T}-\pi_{\sigma^{1:T}}^{*1:T}~\|_{\infty}\leq\epsilon r_{n}n^{-1/4}\}$ (whose probability converges to $1$ ), for $n$ large enough, the permutation $\sigma^{t}$ minimizing the distance between $\pi^{*t}$ and $\hat{\pi}_{\sigma^{t}}^{t}$ is the same for every $t\in\llbracket 1,T\rrbracket$ . We consider $n$ large enough such that $\epsilon r_{n}n^{-1/4}<\min_{1\leq q\neq l\leq Q}|\pi^{*}_{qq}-\pi^{*}_{ll}|/4$ . Denoting by $\sigma_{m}^{1},\ldots,\sigma_{m}^{T}$ the permutations (depending on $n$ ) minimizing $\|\hat{\pi}^{1:T}-\pi_{\sigma^{1:T}}^{*1:T}~\|_{\infty}$ , we have that, for any $1\leq t\neq t^{\prime}\leq T$ , if some $q,l\in\llbracket 1,Q\rrbracket$ are such that $\sigma_{m}^{t}(q)=\sigma_{m}^{t^{\prime}}(l)$ , then

[TABLE]

and on the event we consider

[TABLE]

implying that $q=l$ . This means that on this event, the permutation $\sigma_{m}^{t}$ minimizing the distance between $\pi^{*t}$ and $\hat{\pi}_{\sigma^{t}}^{t}$ is the same for every $t\in\llbracket 1,T\rrbracket$ . We can conclude that

[TABLE]

∎

A.2 Proof of Theorem 4

First, let us introduce some notations, as in the proof of Theorem 2. For any fixed configuration $z^{*1:T}\in\Omega_{\eta}$ , we define for any configuration $z^{1:T}$ and any parameter $\theta$

[TABLE]

and for any $1\leq t\leq T$

[TABLE]

and as before, we abbreviate to $D^{*}$ (resp. $\breve{D}$ ), the set $D_{n,T}(z^{1:T},\pi^{*1:T})$ (resp. $D_{n,T}(z^{1:T},\breve{\pi}^{1:T})$ ). We also introduce for any $q,l,q^{\prime},l^{\prime}\in\llbracket 1,Q\rrbracket$ the quantities $F_{qlq^{\prime}l^{\prime}}$ , $F_{ql}$ , $G_{qlq^{\prime}l^{\prime}}$ and $G_{ql}$ as before, accordingly to this definition of $D_{n,T}(z^{1:T},\pi^{1:T})$ . Finally, we introduce for any $t\in\llbracket 1,T\rrbracket$ and $q,l,q^{\prime},l^{\prime}\in\llbracket 1,Q\rrbracket$ the quantities

[TABLE]

Note that we can get an equivalent of Lemma 10 with a similar proof that gives that for any configuration $z^{*1:T}$ in $\Omega_{\eta}$ , for any configuration $z^{1:T}$ and any $\theta\in\Theta^{T}$ ,

[TABLE]

In the same way, we have an equivalent of Lemma 11 (with a similar proof) that gives that for any $z^{t}$ and $z^{*t}$ two configurations at time $t$ such that $\|z^{t}-z^{*t}\|_{0}=r(t)$ and any parameter $\pi^{t}=(\pi^{t}_{ql})_{1\leq q,l\leq Q}$ , we have

[TABLE]

Going back to the proof of Theorem 4, we follow the line of that of Theorem 2, with a few changes. We get the same decomposition as in equation (26), replacing $\pi$ by $\pi^{1},\ldots,\pi^{T}$ in the definitions of $U_{1}$ , $U_{2}$ and $U_{3}$ , and replacing the event $\Omega_{n,T}$ by $\Omega_{n}=\{\|\hat{\pi}^{1:T}-\pi^{*1:T}\|_{\infty}\leq v_{n}\}$ . For $U_{1}$ , the proof does not change. For $U_{2}$ , we write (instead of (5.3))

[TABLE]

For every $u>0$ , we thus have

[TABLE]

We start by dealing with the first term of (A.2). Notice that on the event $\Omega_{n}$ , we have $\left|\breve{\pi}^{t}_{ql}-\pi^{*t}_{ql}\right|/(\pi^{*t}_{ql}(1-\pi^{*t}_{ql}))\leq v_{n}/\zeta^{2}$ for every $q,l\in\llbracket 1,Q\rrbracket$ . As the set $G^{t}_{ql}$ is random (because $\breve{D}^{t}$ is random), we write for every $t\in\llbracket 1,T\rrbracket$ , using (39),

[TABLE]

where now $D$ is a deterministic set. By a union bound and Hoeffding’s inequality, we have for any $D\subset D^{t}_{n,T}(z^{t})$

[TABLE]

This leads to, for the first term of (A.2),

[TABLE]

For the second term of (A.2), we get from a union bound and from (39) that

[TABLE]

Finally, we have the following upper bound for $U_{2}$

[TABLE]

For the third term $U_{3}$ , denoting $G^{*t}_{ql}=\cup_{1\leq q^{\prime},l^{\prime}\leq Q}G^{t}_{ql}=\{(i,j)\in D^{*t}\cup\breve{D}^{t};z_{i}^{*t}=q,z_{j}^{*t}=l\}$ , we have

[TABLE]

Then, we have on the event $\Omega_{n}$ and for $n$ large enough such that $|(\breve{\pi}^{t}_{ql}-\pi^{*t}_{ql})/\pi^{*t}_{ql}|\leq 1/2$ and $|(\breve{\pi}^{t}_{ql}-\pi^{*t}_{ql})/(1-\pi^{*t}_{ql})|\leq 1/2$ for every $q$ and $l$ , using the fact that $|\log(1+x)|\leq 2|x|$ for $x\in[-1/2,1/2]$ ,

[TABLE]

Then, for every $u>0$ ,

[TABLE]

For the first term of (41), using Hoeffding’s inequality as before,

[TABLE]

and for the second term of (41),

[TABLE]

Finally, we have the following upper bound for $U_{3}$

[TABLE]

Now we choose the sequence $v_{n}$ such that $v_{n}=o(\sqrt{\log n}/n)$ which is sufficient to imply that the quantities $\mathbb{P}^{*}_{\theta^{*}}\left(v_{n}>\zeta^{2}\log(nT)/(4Q^{2}Tn)\right)$ and $\mathbb{P}^{*}_{\theta^{*}}\left(v_{n}>\zeta\log(nT)/(16Tn)\right)$ vanish as $n$ increases and we gather the three upper bounds. For large enough values of $n$ and with $C_{1}$ , $C_{2}$ , $C_{3}$ , $C_{4}$ and $\kappa$ positive constants only depending on $Q$ , $\zeta$ , $K^{*}$ and $T$ , we then have

[TABLE]

Then, introducing

[TABLE]

we conclude as in the proof of Theorem 2, noticing that $nTu_{nT}$ (resp. $nTw_{nT}$ ) converges to 0 as $n$ increases as long as $\log(1/y_{n})=o(n)$ (resp. as long as $v_{n}=o(\sqrt{\log(n)}/n)$ ). ∎

A.3 Proof of Corollary 6

As in the proof of Theorem 5, using the convergence in Equation (38) and Lemma 13, we obtain for any $\epsilon>0$

[TABLE]

We then conclude by using Lemma 8 applied with $F_{n,T}=\frac{2}{n(n-1)T}\mathcal{J}(\hat{\chi}(\cdot),\cdot)$ . ∎

Appendix B Proofs of technical lemmas

B.1 Proof of Lemma 1

As in the proof of Lemma E.2 from Celisse et al. [2012], we use the method of Lagrange multipliers to find the fixed-point equation of the critical point. Recall that $\theta=(\Gamma,\pi)$ and let us denote the likelihood $L(\Gamma,\pi)\coloneqq\exp\ell(\theta)=\mathbb{P}_{\theta}(X^{1:T})$ and the conditional likelihood $L_{c}(z^{1:T},\pi)=\mathbb{P}_{\theta}(X^{1:T}\>|\>Z^{1:T}=z^{1:T})$ . Recall the definition of $N_{ql}(z^{1:T})$ in (1) and that

[TABLE]

We compute the derivative of the Lagrangian with respect to each parameter $\gamma_{ql}$ .

[TABLE]

At the critical point $\breve{\theta}=(\breve{\gamma},\breve{\pi})$ , we obtain that for each $(q,l)\in\llbracket 1,Q\rrbracket^{2}$ we have

[TABLE]

where $\propto$ means ’proportional to’. The constraint $\sum_{l}\gamma_{ql}=1$ gives the normalizing term and we obtain

[TABLE]

∎

B.2 Proof of Lemma 2

We can write the quantity to optimize

[TABLE]

Using this expression, we can obtain directly the expected fixed-point equation for the variational estimator of the transition probability from $q$ to $l$ . ∎

B.3 Proof of Lemma 3

We rely on the notation introduced in the proof of Theorem 1. For any $t\in\llbracket 1,T\rrbracket$ , using classical dependency rules in directed acyclic graphs and the expression (9) of $\hat{z}^{t}$ , we write

[TABLE]

and thus

[TABLE]

Using Bayes’ rule, we have

[TABLE]

Taking the expectation of this quantity with respect to any distribution $\mathbb{Q}$ on $Z^{t}$ , we obtain

[TABLE]

where $\mathrm{KL}\left(\mathbb{Q};\mathbb{P}_{\theta}(Z^{t}\>|\>X^{1:t})\right)=\mathbb{E}_{\mathbb{Q}}\left[\log\mathbb{Q}(Z^{t})-\log\mathbb{P}_{\theta}(Z^{t}\>|\>X^{1:t})\right]$ is a Kullback-Leibler divergence (thus non negative) and $\mathcal{H}(\mathbb{Q})=-\mathbb{E}_{\mathbb{Q}}\left[\log\mathbb{Q}(Z^{t})\right]$ is the entropy of $\mathbb{Q}$ .

Taking now $\mathbb{Q}$ as the Dirac distribution located on $\hat{z}^{t}$ , we have $\mathcal{H}(\mathbb{Q})=0$ and

[TABLE]

Now, combining Inequalities (43) and (44), we obtain

[TABLE]

giving the expected result. ∎

B.4 Proof of Lemma 4

To prove this lemma, we first establish a control of the expectation of the random variable appearing in the statement.

Lemma 15.

We have the following inequality for $z^{*1:T}$ and $z^{1:T}$ any configurations and any $\theta\in\Theta$

[TABLE]

with $\Lambda=2\log[(1-\zeta)/\zeta]$ .

We now turn to the proof of Lemma 4. Let us first recall Talagrand’s inequality [see for e.g. Massart, 2007, page 170, Equation (5.50)].

Theorem (Talagrand’s inequality).

Let $\{Y_{ij}^{t}\}_{1\leq i<j\leq n,1\leq t\leq T}$ denote independent and centered random variables. Define

[TABLE]

where $\mathcal{G}\subset\mathbb{R}^{n(n-1)T/2}$ . Let us further assume that there exist $b>0$ and $\sigma^{2}>0$ such that $|Y^{t}_{ij}g^{t}_{ij}|\leq b$ for every $(i,j,t)\in\llbracket 1,n\rrbracket^{2}\times\llbracket 1,T\rrbracket$ and any $g\in\mathcal{G}$ and $\sup_{g\in\mathcal{G}}\sum_{i<j}\sum_{t}\mathrm{Var}(Y^{t}_{ij}g^{t}_{ij})\leq\sigma^{2}$ . Then, for every $\beta>0$ and $x>0$ , for any finite set $\{g_{1},\ldots,g_{2^{n(n-1)T/2}}\}$ of elements of $\mathcal{G}$ , we have

[TABLE]

First, notice that $\operatorname*{argmin}_{\varpi\in[\zeta,1-\zeta]}\log(\varpi/(1-\varpi))=\zeta$ and $\operatorname*{argmax}_{\varpi\in[\zeta,1-\zeta]}\log(\varpi/(1-\varpi))=1-\zeta$ so that we have

[TABLE]

with $\varpi\coloneqq\{\varpi_{i,j}^{t}\}_{1\leq i<j\leq n,1\leq t\leq T}$ . The set $\{\zeta,1-\zeta\}^{n(n-1)T/2}$ is finite, of size ${2^{n(n-1)T/2}}$ . Let us now apply Talagrand’s inequality to our setup. Note that for every $(i,j,t)\in\llbracket 1,n\rrbracket^{2}\times\llbracket 1,T\rrbracket$ , for any $\pi\in[\zeta,1-\zeta]^{Q^{2}}$ , we have

[TABLE]

almost surely thanks to Assumption 3, and with $\Lambda$ as defined in Lemma 15. Combining this result with Lemma 15 and writing $\Omega=(1+\beta)\Lambda\sqrt{n(n-1)T/2}+\sqrt{n(n-1)T(\Lambda/2)^{2}x_{n,T}}+(1/\beta+1/3)(\Lambda/2)x_{n,T}$ , we have for any $\epsilon>0$ , for any $\beta>0$ , applying Talagrand’s inequality with $b=\Lambda/2$ and $\sigma^{2}=n(n-1)T/2(\Lambda/2)^{2}$ ,

[TABLE]

∎

B.5 Proof of Lemma 5

For any $\eta\in(0,\delta)$ , Hoeffding’s inequality [see for example Theorem 2.8 from Boucheron et al., 2013] gives that

[TABLE]

which concludes the proof. ∎

B.6 Proof of Lemma 6

First notice that $\operatorname*{argmax}_{A\in\mathcal{A}}\mathbb{M}(\pi,A)$ may not be unique, it is in fact a closed subset of $\mathcal{A}$ . However, we choose a fixed element $\bar{A}_{\pi}$ in this subset in the following. Letting $\epsilon>0$ and $\eta\in(0,\delta)$ and using Lemma 5, we can split the probability as

[TABLE]

recalling that

[TABLE]

We thus want to bound the quantity $\mathbb{P}_{\theta^{*}}\left(T^{-1}\sum_{t=1}^{T}\sup_{\pi\in[\zeta,1-\zeta]^{Q^{2}}}\left|\mathbb{M}(\pi,\bar{A}_{\pi}^{t})-\mathbb{M}(\pi,\bar{A}_{\pi})\right|>\epsilon r_{n}/(6\sqrt{n})\right)$ on the event $\left\{Z^{1:T}\in\Omega_{\eta}(\theta^{*})\right\}$ , which means bounding

[TABLE]

Let us denote for any matrix $P$ of size $m\times n$ the norm $\|P\|_{\infty}=\max_{(i,j)\in\llbracket 1,m\rrbracket\times\llbracket 1,n\rrbracket}|P_{ij}|$ . Then note that, for any matrix $\breve{A}$ with coefficients in $[0,1]$ , for any $\pi\in[\zeta,1-\zeta]^{Q^{2}}$ , using Assumption 2 and 3,

[TABLE]

with $c=4(1-\delta)^{2}(1-\zeta)\log(1/\zeta)Q^{4}$ . On the event $\Omega_{\eta}(\theta^{*})$ we then have

[TABLE]

We then show that for any $\epsilon>0$ , for every $t\in\llbracket 1,T\rrbracket$ and every $\pi\in[\zeta,1-\zeta]^{Q^{2}}$ , for any $n$ such that $n>6c\sqrt{n}/[\epsilon r_{n}(\delta-\eta)]$ , there exists some $\breve{A}\in\mathcal{A}^{t}(Z^{1:T})$ such that $\|\breve{A}-\bar{A}_{\pi}\|_{\infty}<\epsilon r_{n}/(6c\sqrt{n})$ , i.e. such that for every $q,l$ , $|\breve{a}_{ql}-\bar{a}_{ql}|<\epsilon r_{n}/(6c\sqrt{n})$ . For every $1\leq q\leq Q$ , we can construct $\breve{A}_{q\cdot}=(\breve{a}_{q1},\ldots,\breve{a}_{qQ})$ as follows. On the event $\Omega_{\eta}(\theta^{*})$ , for every $q\in\llbracket 1,Q\rrbracket$ , for any $n$ such that $n>6c\sqrt{n}/[\epsilon r_{n}(\delta-\eta)]$ , we have $N_{q}(Z^{t})\epsilon r_{n}/(6c\sqrt{n})>1$ for every $t\in\llbracket 1,T\rrbracket$ . We then construct $(\breve{n}_{ql})_{1\leq l\leq Q}$ as follows and take $\breve{a}_{ql}=\breve{n}_{ql}/N_{q}(Z^{1:T})$ for every $l\in\llbracket 1,Q\rrbracket$ .

•

for $l=1$ choose $\breve{n}_{q1}$ as the closest integer to $N_{q}(Z^{t})\bar{a}_{q1}$ . It is in the interval $(N_{q}(Z^{t})\bar{a}_{q1}-1,N_{q}(Z^{t})\bar{a}_{q1}+1)$ so we have $|\bar{a}_{q1}-\breve{n}_{q1}/N_{q}(Z^{t})|<1/N_{q}(Z^{t})<\epsilon r_{n}/(6c\sqrt{n})$ . Moreover, note that $0\leq\breve{n}_{q1}\leq N_{q}(Z^{t})$ because $0\leq N_{q}(Z^{t})\bar{a}_{q1}\leq N_{q}(Z^{t})$ .

•

Repeat for $l=2,\ldots,Q$

–

if $\sum_{l^{\prime}=1}^{l-1}(N_{q}(Z^{t})\bar{a}_{ql^{\prime}}-\breve{n}_{ql^{\prime}})\geq 0$ choose $\breve{n}_{ql}$ as the closest bigger (or equal) integer to $N_{q}(Z^{t})\bar{a}_{ql}$ .

–

if $\sum_{l^{\prime}=1}^{l-1}(N_{q}(Z^{t})\bar{a}_{ql^{\prime}}-\breve{n}_{ql^{\prime}})<0$ choose $\breve{n}_{ql}$ as the closest smaller (or equal) integer to $N_{q}(Z^{t})\bar{a}_{ql}$ .

As before, $\breve{n}_{ql}$ is in the interval $(N_{q}(Z^{t})\bar{a}_{ql}-1,N_{q}(Z^{t})\bar{a}_{ql}+1)$ so we have $|\bar{a}_{ql}-\breve{n}_{ql}/N_{q}(Z^{t})|<1/N_{q}(Z^{1:T})<\epsilon r_{n}/(6c\sqrt{n})$ . Moreover $0\leq\breve{n}_{ql}\leq N_{q}(Z^{t})$ because $0\leq N_{q}(Z^{t})\bar{a}_{ql}\leq N_{q}(Z^{t})$ . We also have (by induction)

[TABLE]

In the end, we have $|\sum_{l=1}^{Q}(N_{q}(Z^{t})\bar{a}_{ql}-\breve{n}_{ql})|<1$ i.e. $|N_{q}(Z^{t})-\sum_{l=1}^{Q}\breve{n}_{ql}|<1$ , meaning that $\sum_{l=1}^{Q}\breve{n}_{ql}=N_{q}(Z^{t})$ , both $N_{q}(Z^{t})$ and $\sum_{l=1}^{Q}\breve{n}_{ql}$ being integers. Then, if $n>6c\sqrt{n}/[\epsilon r_{n}(\delta-\eta)]$ , there exists $\breve{A}\in\mathcal{A}^{t}(Z^{1:T})$ such that $\|\breve{A}-\bar{A}_{\pi}\|_{\infty}<\epsilon r_{n}/(6c\sqrt{n})$ . This leads to

[TABLE]

which concludes the proof. ∎

B.7 Proof of Lemma 7

We can upper bound the expectation as follows

[TABLE]

We have for any $q\in\llbracket 1,Q\rrbracket$

[TABLE]

This implies that

[TABLE]

and identically

[TABLE]

This leads to

[TABLE]

using the fact that $0\leq\alpha_{q}^{*}\leq 1$ for every $q\in\llbracket 1,Q\rrbracket$ . ∎

B.8 Proof of Lemma 8

We first consider the case when $T\rightarrow\infty$ , and $\pi$ is constant over time. We use the following lemma.

Lemma 16.

For any $\theta\in\Theta$ , we have for $\epsilon$ small enough ( $0<\epsilon<\min_{1\leq q\neq q^{\prime}\leq Q}\max_{1\leq l\leq Q}|\pi^{*}_{ql}-\pi^{*}_{q^{\prime}l}|/2$ )

[TABLE]

This gives an upper bound on the probability of interest

[TABLE]

By definition of $\hat{\theta}=(\hat{\Gamma},\hat{\pi})$ , we write

[TABLE]

implying that

[TABLE]

We then obtain the following upper bound, that converges to [math] as $n$ and $T$ increase by assumption,

[TABLE]

When the number of time steps $T$ is fixed and $\pi$ is allowed to vary over time, the proof is almost the same. Indeed, $\min_{\sigma^{1},\ldots,\sigma^{T}\in\mathfrak{S}_{Q}}\|\hat{\pi}_{\sigma^{1:T}}^{1:T}-\pi^{*1:T}\|_{\infty}>\epsilon\sqrt{v_{n}}$ means that there exists $t\in\llbracket 1,T\rrbracket$ such that $\min_{\sigma^{t}\in\mathfrak{S}_{Q}}\|\hat{\pi}_{\sigma^{t}}^{t}-\pi^{*t}\|_{\infty}>\epsilon\sqrt{v_{n}}$ and we can apply Lemma 16 to this $\hat{\pi}^{t}$ to obtain that $\mathbb{M}(\pi^{*t})-\mathbb{M}(\hat{\pi}^{t})>2\epsilon^{2}\delta^{2}v_{n}/Q^{2}$ . This implies that $\mathbb{M}^{T}(\pi^{*1:T})-\mathbb{M}^{T}(\hat{\pi}^{1:T})>2\epsilon^{2}\delta^{2}v_{n}/(TQ^{2})$ , which allows to conclude in the same way as before. ∎

B.9 Proof of Lemma 9

We have

[TABLE]

We decompose this sum as

[TABLE]

In the first sum of the right-hand side of (B.9), the terms are different from zero only for triplets $(i,j,t)$ in $D^{*}$ . Similarly in the last sum, the terms are different from zero for triplets $(i,j,t)$ in $D^{*}\cup\breve{D}$ . As a consequence, we obtain

[TABLE]

We now write the last sum in the right-hand side as

[TABLE]

Distinguishing between the cases where $X_{ij}^{t}=1$ and $X_{ij}^{t}=0$ , we obtain

[TABLE]

In the end, we decompose

[TABLE]

which gives the result.

B.10 Proof of Lemma 10

We first notice that

[TABLE]

For every $t\in\llbracket 1,T\rrbracket$ , we can apply Proposition B.4. from Celisse et al. [2012], as their Assumption (A4) is required to hold only for $z^{*t}$ (see proof) and is valid on $\Omega_{\eta}(\theta)$ with the constant $\delta-\eta$ . We obtain

[TABLE]

We conclude by noticing that $\sum_{t=1}^{T}r(t)=r$ .

B.11 Proof of Lemma 11

The inclusion of the sets is straightforward. Now we have

[TABLE]

B.12 Proof of Lemma 12

First, let us decompose the quantity at stake as follows

[TABLE]

and upper bound the two terms in the right-hand side of (B.12). For the first one we will follow the proof of Theorem 3.9 from Celisse et al. [2012]. Let $z^{1:T}$ denote a fixed configuration. We work on the set $\{Z^{1:T}=z^{1:T}\}$ and write

[TABLE]

Then

[TABLE]

where the last inequality comes from Theorem 2 where the bound is uniform with respect to $z^{1:T}$ .

Now, for the second term of (B.12), we use the following lemma.

Lemma 17.

There exist $c_{1},c_{2}>0$ such that for any $\epsilon>0$ , for any sequence $\{r_{n,T}\}_{n,T\geq 1}$ , we have, as long as $\epsilon r_{n,T}\sqrt{\log n}/(2\alpha^{*}_{q}\gamma^{*}_{ql}\sqrt{nT})<1$ ,

[TABLE]

We then combine the two upper bounds obtained in (49) and (50) in order to conclude, the assumption $\epsilon r_{n,T}\sqrt{\log n}/(2\alpha^{*}_{q}\gamma^{*}_{ql}\sqrt{nT})<1$ being satisfied for $n$ and $T$ large enough because $r_{n,T}=o(\sqrt{nT/\log n})$ . We obtain the expected result, using the fact that $\log(T)=o(n)$ , that $r_{n,T}$ increases to infinity and that $v_{n,T}=o\left(\sqrt{\log(nT)}/n\right)$ ,

[TABLE]

∎

B.13 Proof of Lemma 13

We have the following inequalities by definition of $\hat{z}^{1:T}$ , $\mathcal{J}(\chi,\theta)$ and $\hat{\chi}(\theta)$ and because the Kullback-Leibler divergence is non-negative

[TABLE]

with $\mathcal{J}(\hat{z}^{1:T},\theta)=\ell(\theta)-KL(\delta_{\hat{z}^{1:T}},\mathbb{P}_{\theta}(\cdot|X^{1:T}))$ . We write this Kullback-Leibler divergence (from $\mathbb{P}_{\theta}(\cdot|X^{1:T})$ to $\mathbb{Q}_{\chi}=\delta_{\hat{z}^{1:T}}$ , with $\chi=(\tau,\eta)$ such that $\tau_{iq}^{t}=\hat{z}_{iq}^{t}$ and $\eta^{t}_{iql}=\hat{z}_{iq}^{t}\hat{z}_{il}^{t+1}$ ) as follows

[TABLE]

We then obtain

[TABLE]

Combined with (51), this leads to the following inequality for any parameter $\theta\in\Theta$

[TABLE]

We can conclude that

[TABLE]

∎

B.14 Proof of Lemma 14

This proof is quite similar to that of Lemma 12. For any $\epsilon>0$ , let us write

[TABLE]

and upper bound the two probabilities in the right-hand side of this inequality. We already proved in Lemma 12 that the second term converges to [math] thanks to the assumptions on the sequence $\{r_{n,T}\}_{n,T\geq 1}$ . For the first term, let $z^{1:T}$ denote a fixed configuration. Let us work on the set $\{Z^{1:T}=z^{1:T}\}$ and use the same method as in the proof of Lemma 12,

[TABLE]

leading to

[TABLE]

Then we obtain

[TABLE]

For each $z^{1:T}$ , we use the following lemma.

Lemma 18.

Denoting $\tilde{\mathbb{P}}_{\sigma}(\cdot)=\mathbb{P}_{\tilde{\theta}_{\sigma}}(Z^{1:T}=\cdot\>|\>X^{1:T})$ , we have the following inequality for any configuration $z^{1:T}$

[TABLE]

This gives us

[TABLE]

Noticing that the assumptions on $\{r_{n,T}\}_{n,T\geq 1}$ imply that

[TABLE]

we can conclude by applying the result of Theorem 2 with the estimator $\tilde{\theta}_{\sigma}=(\tilde{\Gamma}_{\sigma},\tilde{\pi}_{\sigma})$ for both terms of the right-hand side of (52). ∎

B.15 Proof of Lemma 15

The proof follows the lines of the proof of Lemma C.3. from Celisse et al. [2012]. Let $\mathbb{E}^{*}_{\theta^{*}}[\cdot]$ denote the expectation given $Z^{1:T}=z^{*1:T}$ , i.e. $\mathbb{E}^{*}_{\theta^{*}}[\cdot]=\mathbb{E}_{\theta^{*}}[\cdot\>|\>Z^{1:T}=z^{*1:T}]$ . Introducing a ghost sample $\{\tilde{X}^{t}_{ij}\}_{i,j,t}$ that is independent of $\{X^{t}_{ij}\}_{i,j,t}$ and has the same distribution, we write

[TABLE]

where $\mathbb{E}^{*}_{\theta^{*},X,\tilde{X}}[\cdot]$ denotes the expectation with respect to $\{X,\tilde{X}\}=\{X_{ij}^{t},\tilde{X}_{ij}^{t}\}_{i,j,t}$ under the true parameter $\theta^{*}$ and given $Z^{1:T}=z^{*1:T}$ . At this point, we notice that, if $\{\epsilon^{t}_{ij}\}_{i,j,t}\coloneqq\epsilon$ are $n^{2}T$ independent Rademacher variables, then the random variables

[TABLE]

follow the same distribution, which implies that

[TABLE]

As a consequence, we have

[TABLE]

Then using Jensen’s inequality, Assumption 3 and the bound $\mathrm{Var}_{\epsilon}(\epsilon_{ij}^{t}X^{t}_{ij})\leq 1$ , we get

[TABLE]

where $\Lambda=2\log[(1-\zeta)/\zeta]$ , concluding the proof. ∎

B.16 Proof of Lemma 16

We assume that $\min_{\sigma\in\mathfrak{S}_{Q}}\|\pi_{\sigma}-\pi^{*}\|_{\infty}>\epsilon$ . Without loss of generality, assume that the permutation (or one of the permutations) minimizing this distance is the identity. Let us write, using the fact that $I_{Q}$ the identity matrix of size $Q$ maximizes in $A$ (over the set of $Q\times Q$ stochastic matrices) the quantity $\mathbb{M}(\pi^{*},A)$ (see the proof of Theorem 3.6 in Celisse et al. [2012]) and denoting $(\bar{a}_{qq^{\prime}})_{q,q^{\prime}\in\llbracket 1,Q\rrbracket}$ the coefficients of $\bar{A}_{\pi}$ (thus depending on $\pi$ ),

[TABLE]

denoting $K(p_{1},p_{2})=p_{1}\log(p_{1}/p_{2})+(1-p_{1})\log[(1-p_{1})/(1-p_{2})]>0$ the Kullback-Leibler divergence from a Bernoulli distribution with parameter $p_{2}$ to a Bernoulli distribution with parameter $p_{1}$ . For every $q$ , there exists $q^{\prime}\coloneqq f(q)$ such that $\bar{a}_{qq^{\prime}}\geq 1/Q$ because $\bar{A}_{\pi}$ is a stochastic matrix. Using Assumption 2, we obtain

[TABLE]

thanks to a result on Kullback-Leibler divergence for Bernoulli distributions (see for instance Bubeck [2010], Chapter 10, Section 2, Lemma 10.3). We then want to show that there exist $q,l$ such that $|\pi^{*}_{ql}-\pi_{f(q)f(l)}|>\epsilon$ .

•

If $f$ is a permutation, the assumption $\min_{\sigma\in\mathfrak{S}_{Q}}\|\pi_{\sigma}-\pi^{*}\|_{\infty}>\epsilon$ gives the expected result.

•

If $f$ is not a permutation, it is not injective and there exist $q_{1}\neq q_{2}$ such that $f(q_{1})=f(q_{2})$ . Thanks to Assumption 1, take $l_{0}\in\llbracket 1,Q\rrbracket$ such that $|\pi_{q_{1}l_{0}}-\pi_{q_{2}l_{0}}|=\max_{l\in\llbracket 1,Q\rrbracket}|\pi_{q_{1}l}-\pi_{q_{2}l}|>0$ . Then

[TABLE]

leading to either $|\pi^{*}_{q_{1}l_{0}}-\pi_{f(q_{1})f(l_{0})}|\geq|\pi^{*}_{q_{1}l_{0}}-\pi^{*}_{q_{2}l_{0}}|/2>\epsilon$ or $|\pi^{*}_{q_{2}l_{0}}-\pi_{f(q_{2})f(l_{0})}|\geq|\pi^{*}_{q_{1}l_{0}}-\pi^{*}_{q_{2}l_{0}}|/2>\epsilon$ , using the fact that $\epsilon<\min_{1\leq q\neq q^{\prime}\leq Q}\max_{1\leq l\leq Q}|\pi^{*}_{ql}-\pi^{*}_{q^{\prime}l}|/2$ .

So, as there exist $q,l$ such that $|\pi^{*}_{ql}-\pi_{f(q)f(l)}|>\epsilon$ , we have

[TABLE]

∎

B.17 Proof of Lemma 17

For any node $i\in\llbracket 1,n\rrbracket$ , the Markov chain $\{Z_{i}^{t}\}_{t\geq 1}$ is geometrically ergodic because its transition matrix $\Gamma$ satisfies Doeblin’s condition thanks to Assumption 2. For any $z\in\llbracket 1,Q\rrbracket$ , let us denote $\delta_{z}$ the Dirac mass at $z$ . There exists a positive constant $A$ and some $r\in(0,1)$ such that $\forall q\in\llbracket 1,Q\rrbracket$ and $\forall t\geq 1$ , we have

[TABLE]

where $\|\cdot\|_{TV}$ is the total variation norm. This leads to

[TABLE]

We now consider the Markov chain $\{Z^{t}=(Z_{1}^{t},\dots,Z_{n}^{t})\}_{t\geq 1}$ of the $n$ nodes evolving through time. Note that it is irreducible and aperiodic. Moreover, its transition matrix is given by $P_{n}=\Gamma^{\otimes n}$ , the $n$ -th Kronecker power of $\Gamma$ and its stationary distribution is $\alpha^{\otimes n}$ . For any $z=(z_{1},\ldots,z_{n})\in\llbracket 1,Q\rrbracket^{n}$ , let us denote $\mu_{n,z}=\otimes_{i=1}^{n}\delta_{z_{i}}$ . For every $t\geq 1$ , we can decompose

[TABLE]

We use

[TABLE]

So, reorganizing the terms, we write

[TABLE]

Let us recall the definition of an $\epsilon$ -mixing time. For any Markov transition matrix $M$ over the set $\mathcal{X}$ with stationary distribution $\alpha$ , for any $\epsilon>0$ , the $\epsilon$ -mixing time of the Markov chain is defined as

[TABLE]

Denoting by $\tau_{n}(\epsilon)$ the $\epsilon$ -mixing time of the Markov chain $\{Z^{t}\}_{t\geq 1}$ , we thus obtain

[TABLE]

Now, we introduce a new Markov chain $Y=\{Y^{t}\}_{t\geq 1}$ , that is defined by

[TABLE]

Notice that it is irreducible and aperiodic, with stationary distribution $\rho$ defined for every state $(q_{1}^{t},\ldots,q_{n}^{t},q_{1}^{t+1},\ldots,q_{n}^{t+1})$ by

[TABLE]

It is easily seen that for any $\epsilon>0$ , its $\epsilon$ -mixing time $\tau_{Y,n}(\epsilon)$ equals $\tau_{n}(\epsilon)+1$ . We apply Theorem 3 from Chung et al. [2012], for any $\eta\leq 1/8$ , considering the weight function $f(Y^{t})=\sum_{i=1}^{n}$ for every $t\geq 1$ (of expectation $n\alpha^{*}_{q}\gamma^{*}_{ql}$ under the stationary distribution). Then $N_{ql}(Z^{1:T})=\sum_{t=1}^{T-1}f(Y^{t})$ , and denoting $\epsilon_{n,T}=\epsilon r_{n,T}\sqrt{\log n}/(2\alpha^{*}_{q}\gamma^{*}_{ql}\sqrt{nT})$ , we obtain that there exist $c_{1},c_{2}>0$ such that for any $\epsilon>0$ , as long as $\epsilon_{n,T}\leq 1$

[TABLE]

∎

B.18 Proof of Lemma 18

For any configuration $z^{1:T}$ ,

[TABLE]

the third inequality being true because by definition $\mathbb{Q}_{\hat{\chi}(\tilde{\theta}_{\sigma})}$ minimizes $KL(\cdot,\tilde{\mathbb{P}}_{\sigma})$ over the set of variational distributions. ∎

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bartolucci et al. [2018] F. Bartolucci, M. F. Marino, and S. Pandolfi. Dealing with reciprocity in dynamic stochastic block models. Comput. Stat. Data Anal. , 123(C):86–100, 2018.
2Becker and Holzmann [2018] A.-K. Becker and H. Holzmann. Nonparametric identification in the dynamic stochastic block model. ar Xiv e-prints , page ar Xiv:1811.00934, Nov. 2018.
3Bickel et al. [2013] P. Bickel, D. Choi, X. Chang, and H. Zhang. Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist. , 41(4):1922–1943, 08 2013.
4Boucheron et al. [2013] S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence . OUP Oxford, 2013.
5Bubeck [2010] S. Bubeck. Jeux de bandits et fondations du clustering . Ph D thesis, Université Lille 1, 2010.
6Celisse et al. [2012] A. Celisse, J.-J. Daudin, and L. Pierre. Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Statist. , 6:1847–1899, 2012.
7Chung et al. [2012] K.-M. Chung, H. Lam, Z. Liu, and M. Mitzenmacher. Chernoff-Hoeffding bounds for Markov chains: generalized and simplified. In C. Dürr and T. Wilke, editors, 29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012) , volume 14 of Leibniz International Proceedings in Informatics (LIP Ics) , pages 124–135, Dagstuhl, Germany, 2012. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
8Daudin et al. [2008] J.-J. Daudin, F. Picard, and S. Robin. A mixture model for random graphs. Statistics and Computing , 18(2):173–183, Jun 2008.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Consistency of the maximum likelihood and variational estimators in a dynamic stochastic block model

Abstract

1 Introduction

2 Model and notation

2.1 Dynamic stochastic block model

2.2 Assumptions

2.3 Finite time case

2.4 Likelihood

3 Consistency of the maximum likelihood estimate

3.1 Connectivity parameter

Theorem 1**.**

Corollary 1**.**

Corollary 2**.**

Remark 1**.**

3.2 Latent transition matrix

Lemma 1**.**

Theorem 2**.**

Theorem 3**.**

Corollary 3**.**

Remark 2**.**

Theorem 4**.**

Corollary 4**.**

Remark 3**.**

4 Variational estimators

4.1 Connectivity parameter

Theorem 5**.**

Corollary 5**.**

Corollary 6**.**

Remark 4**.**

4.2 Latent transition matrix

Lemma 2**.**

Theorem 6**.**

Corollary 7**.**

Corollary 8**.**

Remark 5**.**

5 Proofs of main results

5.1 Proof of Theorem 1

First term of the right-hand side of (10).

Lemma 3**.**

Second term of the right-hand side of (10).

Lemma 4**.**

Third term of the right-hand side of (10).

Lemma 5**.**

Lemma 6**.**

Lemma 7**.**

5.2 Proof of Corollary 1

Lemma 8**.**

5.3 Proof of Theorem 2

Lemma 9**.**

First term in the right-hand side of (5.3).

Lemma 10**.**

Second term in the right-hand side of (5.3).

Lemma 11**.**

Third term in the right-hand side of (5.3).

Combining the 3 bounds on the right-hand-side of (5.3).

5.4 Proof of Theorem 3

First term of the right-hand side of (33).

Lemma 12**.**

Second term of the right-hand side of (33).

5.5 Proof of Corollary 3

5.6 Proof of Theorem 5

Lemma 13**.**

5.7 Proof of Corollary 5

5.8 Proof of Theorem 6

Lemma 14**.**

Acknowledgement

Appendix A Proofs of main results for the finite time case

A.1 Proof of Corollary 2

A.2 Proof of Theorem 4

A.3 Proof of Corollary 6

Appendix B Proofs of technical lemmas

B.1 Proof of Lemma 1

B.2 Proof of Lemma 2

Theorem 1.

Corollary 1.

Corollary 2.

Remark 1.

Lemma 1.

Theorem 2.

Theorem 3.

Corollary 3.

Remark 2.

Theorem 4.

Corollary 4.

Remark 3.

Theorem 5.

Corollary 5.

Corollary 6.

Remark 4.

Lemma 2.

Theorem 6.

Corollary 7.

Corollary 8.

Remark 5.

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.

Lemma 11.

Lemma 12.

Lemma 13.

Lemma 14.

Lemma 15.

Theorem (Talagrand’s inequality).

Lemma 16.

Lemma 17.

Lemma 18.