Consistency of the maximum likelihood and variational estimators in a dynamic stochastic block model
L\'ea Longepierre (LPSM UMR 8001), Catherine Matias (LPSM UMR 8001)

TL;DR
This paper proves the consistency and convergence rates of maximum likelihood and variational estimators in a dynamic stochastic block model with evolving node memberships modeled by a hidden Markov chain.
Contribution
It establishes the theoretical consistency and convergence rates of estimators in a dynamic stochastic block model with temporal evolution of node classes.
Findings
Proves consistency of estimators as nodes and time steps increase
Provides upper bounds on convergence rates of estimators
Analyzes a case with fixed time steps and varying connectivity parameters
Abstract
We consider a dynamic version of the stochastic block model, in which the nodes are partitioned into latent classes and the connection between two nodes is drawn from a Bernoulli distribution depending on the classes of these two nodes. The temporal evolution is modeled through a hidden Markov chain on the nodes memberships. We prove the consistency (as the number of nodes and time steps increase) of the maximum likelihood and variational estimators of the model parameters, and obtain upper bounds on the rates of convergence of these estimators. We also explore the particular case where the number of time steps is fixed and connectivity parameters are allowed to vary.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and statistical mechanics · Bayesian Methods and Mixture Models · Random Matrices and Applications
Consistency of the maximum likelihood and variational estimators in a dynamic stochastic block model
Léa Longepierre and Catherine Matias
Sorbonne Université, Université Paris Diderot, Centre National de la Recherche Scientifique,
Laboratoire de Probabilités, Statistique et Modélisation,
4 place Jussieu, 75252 PARIS Cedex 05, FRANCE.
{lea.longepierre,catherine.matias}@sorbonne-universite.fr
Abstract
We consider a dynamic version of the stochastic block model, in which the nodes are partitioned into latent classes and the connection between two nodes is drawn from a Bernoulli distribution depending on the classes of these two nodes. The temporal evolution is modeled through a hidden Markov chain on the nodes memberships. We prove the consistency (as the number of nodes and time steps increase) of the maximum likelihood and variational estimators of the model parameters, and obtain upper bounds on the rates of convergence of these estimators. We also explore the particular case where the number of time steps is fixed and connectivity parameters are allowed to vary.
Keywords: maximum likelihood estimation, dynamic network, dynamic stochastic block model, variational estimation, temporal network
1 Introduction
Random graphs are a suitable tool to model and describe interactions in many kinds of datasets such as biological, ecological, social or transport networks. Here we are interested in time-evolving networks, which is a powerful tool for modeling real-world phenomena, where the role or behaviour of the nodes in the network and the relationships between them are allowed to change over time. Indeed, it is important to take into account the evolutionary behaviour of the graphs, instead of just studying separate snapshots as static graphs. We focus on graphs evolving in discrete time and refer to Holme (2015) for an introduction to dynamic networks.
A myriad of dynamic graph models has been introduced in the past few years, see for instance Zhang et al. (2017). We focus here on those which are based on the (static) stochastic block model (SBM, Holland et al., 1983) in which the nodes are partitioned into classes. In the SBM, class memberships of the nodes are represented by latent variables and the connection between two nodes is drawn from a distribution depending on the classes of these two nodes (a Bernoulli distribution in the case of binary graphs). A first dynamic version of the SBM with discrete time is proposed in Yang et al. (2011). There, the nodes are partitioned into classes and the graphs are binary or weighted. The nodes are allowed to change membership over time, and these changes are governed by independent Markov chains with values in the classes, while the connection probabilities are constant over time. Xu and Hero (2014) introduce a state-space model on the logit of the connection probabilities for dynamic (binary) networks with connection probabilities and group memberships varying over time. Unfortunately, their model presents parameter identifiability issues (Matias and Miele, 2017). Xu (2015) proposes a stochastic block transition model in which the presence or absence of an edge between two nodes at a particular time affects the presence or absence of such an edge at a future time. There, the nodes can change classes over time, new nodes can enter the network, and the connection probabilities are allowed to vary over time. The model in Matias and Miele (2017) and in Becker and Holzmann (2018) is quite similar to that of Yang et al. (2011) except that it allows the connection probabilities to vary and the latter is moreover nonparametric. Bartolucci et al. (2018) extend the model of Yang et al. (2011) to deal with different forms of reciprocity in directed graphs, by directly modeling dyadic relations and with the assumption that the dyads are conditionally independent given the latent variables. Paul and Chen (2016) and Han et al. (2015) study multi-graph SBM, arising in settings including dynamic networks and multi-layer networks where each layer corresponds to a type of edge. In these two models, the nodes memberships stay constant over the layers. Pensky (2019); Pensky et al. (2019) study a dynamic SBM for undirected and binary edges where both connection probabilities and group memberships vary over time, assuming that the connection probabilities between groups are a smooth function of time. Xing et al. (2010) and Ho et al. (2011) introduce dynamic versions of the mixed-membership stochastic block model, allowing each actor to carry out different roles when interacting with different peers. Zreik et al. (2016) introduce the dynamic random subgraph model, given a known decomposition of the graph into subgraphs, in which the latent class membership depends on the subgraph membership and the edges are categorical variables, their types being sampled from a distribution depending on the latent classes of the two nodes. There, a state-space model is used to characterize the temporal evolution of the latent classes proportions.
As far as estimation is concerned, different methods of inference are proposed to estimate groups and model parameters. The maximum likelihood estimator (MLE) is not tractable in the SBM, thus neither in its dynamic versions. Variational methods are rather popular to approximate that MLE (Xing et al., 2010; Ho et al., 2011; Han et al., 2015; Paul and Chen, 2016; Zreik et al., 2016; Matias and Miele, 2017; Bartolucci et al., 2018). Yang et al. (2011) rely on Gibbs sampling and simulated annealing. Pensky et al. (2019) propose an estimator of the connection probabilities matrix at each time step by a discrete kernel-type method and obtain a clustering of the nodes thanks to spectral clustering on this estimated matrix. They also give an estimator for the number of clusters. Spectral clustering algorithms are also used by Han et al. (2015) on the mean graph over time and by Liu et al. (2018) who use eigenvector smoothing to get some similarity across time periods (and allow the number of classes to be unknown and possibly varying over time).
Some theoretical results on the convergence of the procedures have been proven, mainly for static graphs. In the static SBM, Celisse et al. (2012) prove the consistency of the MLE and variational estimates as the number of nodes increases, and Bickel et al. (2013) establish their asymptotic normality. Mariadassou and Matias (2015) have a different approach and give sufficient conditions for the groups posterior distribution to converge to a Dirac mass located at the actual groups configuration, for every parameter in a neighborhood of the true one. Rohe et al. (2011) give asymptotic results on the normalized graph Laplacian and its eigenvectors for the spectral clustering algorithm, allowing the number of clusters to grow with the number of nodes. They also provide bounds on the number of misclustered nodes, requiring an assumption on the degree distribution. Lei and Rinaldo (2015) prove consistency for the recovery of communities in the spectral clustering on the adjacency matrix, with milder conditions on the degrees, and also extend this result to degree corrected stochastic block models. Klopp et al. (2017) derive oracle inequalities for the connection probabilities estimator and obtain minimax estimation rates, including the sparse case where the density of edges converges to zero as the number of nodes increase thus extending previous results of Gao et al. (2015). Gaucher and Klopp (2019) propose a bound on the risk of the maximum likelihood estimator of network connection probabilities, and show that it is minimax optimal in the sparse graphon model.
In the dynamic setting, fewer theoretical results have been established. Pensky (2019) derives a penalized least squares estimator of the connection probabilities adaptive to the number of blocks and which does not require knowledge of the number of classes . She shows that it satisfies an oracle inequality. Under the additional assumption that at most nodes change groups between two time steps, this estimator attains minimax lower bounds for the risk. She also introduces a dynamic graphon model and shows that the estimators (that do not require knowledge of a degree of smoothness of the graphon function) are minimax optimal within a logarithmic factor of the number of time steps. Based on the same dynamic SBM with at most nodes changing groups between two time steps, Pensky et al. (2019) give an upper bound for the (non asymptotic) error of their estimators of the connection probabilities matrix and group memberships (and also an estimator for the number of clusters). Han et al. (2015) show consistency (as the number of time steps increases but the number of nodes is fixed) of two estimators of the class memberships for dynamic SBM (and more generally multi-graph SBM) in which the nodes memberships are constant over time but the connection probabilities are allowed to vary and the considered graphs are binary and symmetric. They show that the spectral clustering (on the mean graph over time) estimator of the class memberships is consistent under some stationarity and ergodicity conditions on the connection probabilities. They also prove that the MLE of the class memberships is consistent (i.e. that the fraction of misclustered nodes converges to [math]) in the general case (without any structure on the connection probabilities), provided certain sufficient conditions are satisfied. In their multi-layer model, Paul and Chen (2016) give minimax rates of misclassification under certain conditions on the growth of the types of relations, number of nodes and number of classes, extending the result of Han et al. (2015).
Here, we consider a dynamic version of the binary SBM as in Yang et al. (2011), where each node is allowed to change group membership at each time step according to a Markov chain, independently of other nodes. We prove the consistency of the connectivity parameter MLE and, under some additional conditions, of the transition matrix MLE, when the number of nodes and of time steps are increasing. We also give upper bounds on the rates of convergence of these estimators. While these upper bounds are known to be non optimal in the static case where asymptotic normality is obtained with classical parametric rates of convergence (Bickel et al., 2013), these are the first to be established in a dynamic setting for the MLE. As already mentioned, the log-likelihood is intractable (except for very small values of the number of nodes and the number of time steps ), as it requires to sum over terms. Thus, while its consistency remains an important result, the estimator cannot be computed. A possible alternative is to rely on a variational estimator to approximate the MLE (see for instance Matias and Miele, 2017). We also establish the consistency of the variational estimator of the connectivity parameter and under some additional assumptions, that of the variational estimator of the transition matrix and obtain the same upper bounds on the rates of convergence as for the MLE. In the particular case where the number of time steps is fixed, we also consider the model of Matias and Miele (2017), in which the connection probabilities are allowed to vary over time and generalise these results with only the number of nodes increasing. When , we not only recover the results of Celisse et al. (2012) but extend these by giving rates of convergence. Unlike the model studied in Han et al. (2015) and Paul and Chen (2016), the node memberships in our model evolve over time. Our context is different from Pensky (2019) that focuses on least squares estimate.
This article is organized as follows. Section 2 introduces our model and notation. More precisely, Section 2.1 describes the dynamic stochastic block model as introduced in Yang et al. (2011), Section 2.2 gives the assumptions we make on the model parameters, Section 2.3 describes the dynamic stochastic block model as in Matias and Miele (2017) for the finite time case and Section 2.4 states the expression of the likelihood of this model to define the MLE. Section 3 establishes the consistency and upper bounds of the rates of convergence for the MLE of the connection probabilities in Section 3.1 and of the transition matrix in Section 3.2. Section 4 is dedicated to variational estimators: Section 4.1 and 4.2 establish the consistency of the variational estimators of the connection probabilities and transition matrix, respectively, along with upper bounds of the associated rates of convergence. All the proofs of the main results are postponed to Section 5, except those for the fixed case that are in Appendix A, while the more technical proofs are deferred to Appendix B.
2 Model and notation
2.1 Dynamic stochastic block model
We consider a set of vertices, forming a sequence of binary undirected graphs with no self-loops at each time . The case of a set of directed graphs, with or without self-loops, may be handled similarly. These vertices are assumed to be split into latent classes, and we denote by the label of the -th vertex at time . Letting , we assume that the are independent and identically distributed (iid) and each is a homogeneous and stationary Markov chain with transition probabilities
[TABLE]
where is a stochastic matrix, i.e. with nonnegative coefficients and with each row summing to 1. We let the stationary distribution of the Markov chain. For any , the probability distribution of is then
[TABLE]
We will also denote and .
Consider the symmetric binary adjacency matrix of the graph at time such that for every nodes , we have and . Each follows a stochastic block model so that, conditional on the latent groups , the are independent Bernoulli random variables
[TABLE]
where are the connectivity parameters. More precisely, conditional on the whole sequence of latent groups , the graphs are assumed to be independent, each having a distribution depending only on . The model is thus parameterized by , with and . Note that is a symmetric matrix in the undirected setup. We denote by (resp. ) the probability distribution (resp. expectation) of all the random variables , under the parameter value . In the following, we assume that we observe and we denote by the true parameter value, with corresponding probability distribution and expectation , and by the (true) stationary distribution corresponding to the transition matrix . We also let denote the indicator function of the set and the complementary set of in the ambient set. For any integer , the set is the set of integers between and . For any finite set , let denote its cardinality. For any configuration , we denote (resp. ) the number of nodes assigned to class by the configuration (resp. the number of transitions from class to class in configuration ), that is
[TABLE]
We also define for any two parameters and the following distances
[TABLE]
2.2 Assumptions
The assumptions we make on the model parameters are the following.
For every , there exists some such that . 2. 2.
There exists some such that for any , we have . 3. 3.
There exists some such that for any , we have .
Assumption 1 is necessary for identifiability of the model. Indeed, if it does not hold, we cannot distinguish between classes and . Assumption 2 ensures that each Markov chain is irreducible, aperiodic and recurrent. This assumption could be weakened at the cost of technicalities. In particular, it implies that the stationary distribution exists. Moreover, Assumption 2 also implies that for any , we have . Note that this can be seen as an equivalent of Assumption 2 in Celisse et al. (2012) (on the probability distribution of the class memberships) in the dynamic case. Celisse et al. (2012) however also have an additional assumption that is an empirical version of this assumption (which states that the observed class proportions are bounded away from [math]) that is true with high probability. We do not make such an assumption and use the fact that the probability of this event converges to . Assumption 3 is technical and could also be weakened with additional technicalities. For example, Celisse et al. (2012) also consider the case (i.e. ) whereas we do not. The whole parameter set defined by these constraints is denoted by . In the following, we assume that .
In what follows, we work up to label permutation on the groups. Indeed, as in any latent group model, the parameters can only be recovered up to label switching on the latent groups. We then define the following notation for any permutation with the set of permutations on
[TABLE]
2.3 Finite time case
If the number of time steps is fixed, it is possible to let the connection probabilities vary over time. We then consider this case, the connection parameter now being with for every and for any . Note that this is the more general model of Matias and Miele (2017), in which the model parameter is . Moreover, we introduce the following Assumptions 1’ and 3’ that are alternate versions of Assumptions 1 and 3 respectively for the finite time case.
- 1’
. For every , for every , there exists some such that . 2. 3’
. There exists some such that for every , for any , we have .
Assumption 1’ (resp. Assumption 3’) expresses that for every , satisfies Assumption 1 (resp. Assumption 3). We also introduce the following additional assumption, which ensures (together with Assumption 1’) that the model is identifiable (up to a label permutation). See Matias and Miele (2017).
For every , for every , and are distinct values.
Assumption 4 states that the diagonal of does not change over time, and that its values are distinct. We denote by the set of parameters satisfying Assumptions 1’, 2, 3’ and 4. As before, we assume in the following that in the fixed case. We also define as before for any and the distance
[TABLE]
2.4 Likelihood
The conditional log-likelihood and the log-likelihood write
[TABLE]
respectively. We then denote the maximum likelihood estimator (MLE) by
[TABLE]
In the next section, we study separately the consistency of the connectivity parameter estimator and that of the transition matrix estimator .
3 Consistency of the maximum likelihood estimate
3.1 Connectivity parameter
We first prove the consistency of the maximum likelihood estimator of the connectivity parameter when the number of nodes and time steps increase. We denote the normalized log-likelihood by
[TABLE]
and introduce the quantities, for any the set of stochastic matrices,
[TABLE]
where . It is worth noticing that , which will be the limiting value for when and increase (see below), does not depend on .
Theorem 1**.**
For any sequence increasing to infinity, if , we have for all
[TABLE]
We then conclude on the consistency of the maximum likelihood estimator of the connection probabilities with the following corollary. Note that we also obtain an upper bound of the rate of convergence of this estimator.
Corollary 1**.**
For any sequence increasing to infinity such that and if , we have for every
[TABLE]
We want to get equivalent consistency results if the number of time steps is fixed and only the number of nodes increases. In that case, denoting by the MLE of , we have the following Corollary that is the equivalent of Corollary 1.
Corollary 2**.**
If the number of time steps is fixed, we have for every and for any sequence increasing to infinity such that
[TABLE]
denoting .
This result states that converges to [math] in -probability as increases, i.e. the MLE of the connection probabilities is consistent up to label switching, and gives an upper bound of the rate of convergence of the MLE of the connection probabilities. The particular case when is then a stronger result than that of Celisse et al. (2012) where no rate of convergence is given.
Remark 1**.**
Note that in Corollaries 1 and 2, the results still hold for any sequences and increasing to infinity, respectively. However, we are interested in sequences increasing slowly to infinity, giving the strongest results, namely the smallest lower bounds. Indeed, whenever these assumptions are not satisfied, the lower bounds appearing in the inequalities are larger, and the results may even become trivial.
3.2 Latent transition matrix
We now prove that the MLE for the transition matrix is consistent when the number of nodes and time steps increase.
Lemma 1**.**
Any critical point of the likelihood function is such that satisfies the fixed point equation
[TABLE]
There are two different possible cases for the MLE
- •
Either is a critical point of the likelihood function. Then satisfies equation (4).
- •
Or is not a critical point (this can happen if it belongs to the boundary of ) and we assume that there exists such that and satisfies equation (4) (at least for and large enough). We then choose as our estimator . By an abuse of notation, we will denote this estimator and call it MLE in the following.
In what follows, for any fixed configuration , any and any , we consider the event
[TABLE]
The following result establishes that asymptotically, any estimator that correctly estimates the transition probability matrix also recovers the group memberships. This result is similar to Theorem 1 in Mariadassou and Matias (2015).
Theorem 2**.**
For any estimator (at least for and large enough), if , there exist some positive constants such that for any , for any positive sequence such that , any and for and large enough, we have
[TABLE]
whenever is a sequence decreasing to [math] such that .
Theorem 3**.**
If , for any and any sequence increasing to infinity such that , we have for any
[TABLE]
with a sequence decreasing to [math] such that .
Corollary 3**.**
Assume that and with a sequence decreasing to [math] such that . Then for any and any sequence increasing to infinity such that , we have the convergence
[TABLE]
Remark 2**.**
Note that the upper bound obtained in Corollary 1 on the rate of convergence in probability of does not ensure that holds. While the latter has never been established (to our knowledge), it is a reasonable assumption.
We want an equivalent result than that of Corollary 3 when the number of time steps is fixed, and the connection probabilities are varying over time (the connection parameter being with ). For that, we are going to need an equivalent of Theorem 2 in that case.
Theorem 4**.**
For any fixed , for any estimator (at least for large enough), there exist some positive constants such that for any , for any positive sequence such that , any and for large enough, we have
[TABLE]
whenever is a sequence decreasing to [math] such that .
The following corollary gives the expected result.
Corollary 4**.**
Let the number of time steps be fixed. Assume that with a sequence decreasing to [math] such that . Then for any and any sequence increasing to infinity such that , we have the convergence
[TABLE]
The proof of Corollary 4 is the same as that of Corollary 3, but relying on Theorem 4 instead of Theorem 2 and is therefore omitted.
Remark 3**.**
As in Remark 1 for Corollaries 1 and 2, the results of Corollaries 3 and 4 still hold for sequences and increasing to infinity at any rate.
4 Variational estimators
In practice, we cannot compute the MLE except for very small values of and , because it involves a summation over all the possible latent configurations. We cannot either use the Expectation-Maximization (EM) algorithm to approximate it because it involves the computation of the conditional distribution of the latent variables given the observations which is not tractable. A common solution is to use the Variational Expectation-Maximization (VEM) algorithm that optimizes a lower bound of the log-likelihood (see for example Daudin et al. (2008)). Let us denote for every and . Using the same approach as in Matias and Miele (2017) for the VEM algorithm in the dynamic SBM, we consider a variational approximation of the conditional distribution of the latent variable given the observed variable in the class of probability distributions parameterized by of the form
[TABLE]
i.e. with such that and . Notice that . The quantity to optimize in the VEM algorithm is then
[TABLE]
with denoting the Kullback-Leibler divergence and denoting the entropy. Define
[TABLE]
and the variational estimator of
[TABLE]
Moreover, we denote . In practice, the VEM algorithm is an iterative algorithm that maximizes the function alternatively with respect to and in order to find .
4.1 Connectivity parameter
Theorem 5**.**
For any sequence increasing to infinity, if , we have for all
[TABLE]
We conclude on the consistency of the connection probabilities variational estimators as and increase thanks to the following corollary.
Corollary 5**.**
For any sequence increasing to infinity such that , we have for any
[TABLE]
We have the equivalent following corollary for a fixed number of time steps.
Corollary 6**.**
If the number of time steps is fixed, we have for every and for any sequence increasing to infinity such that
[TABLE]
Remark 4**.**
As for Corollaries 1 to 4, the results of Corollaries 5 and 6 still hold for any sequences and increasing to infinity.
4.2 Latent transition matrix
We now prove that is consistent when the number of nodes and time steps increase.
Lemma 2**.**
Any critical point of the function is such that satisfies the fixed-point equation
[TABLE]
We assume that is a critical point of . Then we have the fixed-point equation
[TABLE]
The following proposition gives the consistency and a rate of convergence of this estimator, under an assumption on the rate of convergence of .
Theorem 6**.**
If , for any and any sequence increasing to infinity such that and for any
[TABLE]
with a sequence decreasing to [math] such that .
Corollary 7**.**
Assume that and with a sequence decreasing to [math] such that . Then for any and any sequence increasing to infinity such that , we have the convergence
[TABLE]
The proof of Corollary 7 is the same as that of Corollary 3, using Theorem 6 instead of Theorem 3 and is therefore omitted.
When the number of time steps is fixed and the connection probabilities can vary over time, we have the following Corollary that is the equivalent of Corollary 7.
Corollary 8**.**
Let the number of time steps be fixed. Assume that with a sequence decreasing to [math] such that . Then for any and any sequence increasing to infinity such that , we have the convergence
[TABLE]
The proof of Corollary 8 is the same as that of Corollary 7, but relying on Theorem 4 instead of Theorem 2 and is therefore omitted.
Remark 5**.**
As for Corollaries 1 to 6, the results of Corollaries 7 and 8 still hold for any sequences and increasing to infinity.
5 Proofs of main results
5.1 Proof of Theorem 1
The proof follows the lines of the proof of Theorem 3.6 in Celisse et al. (2012). Nonetheless, our result is sharper as we establish an upper bound of the rate of convergence (in probability) of the normalised likelihood. We fix some and introduce the quantities
[TABLE]
Note that is a random variable that depends on and that
[TABLE]
Similarly, for any , we have .
We bound the difference between and by introducing three intermediate terms so that we can write, for any sequence and any
[TABLE]
In the following, we prove separately the convergence (in -probability) to zero of the three terms of this sum (while controlling for the rate of these convergences). Before starting, let us remark that we have
[TABLE]
In particular, for every , we have
[TABLE]
First term of the right-hand side of (10).
We let
[TABLE]
Lemma 3**.**
For every , we have
[TABLE]
Going back to (13) and applying Lemma 3, we get
[TABLE]
Now, using classical dependency rules in directed acyclic graphs (see for e.g. Lauritzen, 1996) combined with Assumption 2, we get
[TABLE]
This implies that as soon as . Then for any sequence increasing to infinity, for any , we have that as and increase.
Second term of the right-hand side of (10).
Let us denote
[TABLE]
For the sake of clarity, we study this term on the event where is a fixed configuration. This event induces the definition of following Equation (8) as
[TABLE]
or equivalently for every ,
[TABLE]
By definition of and respectively, we have the two inequalities
[TABLE]
and
[TABLE]
implying the lower and upper bounds
[TABLE]
Taking the absolute value gives us an upper bound for
[TABLE]
Using Equations (11) and (12), we then obtain the following upper bound for
[TABLE]
We use the following concentration result to conclude.
Lemma 4**.**
Let and a sequence of positive real numbers. We let denote the probability conditional on under parameter , i.e. . Denoting we have for any
[TABLE]
with .
Let us choose in the above lemma. For any , for any sequence increasing to infinity, we have for and large enough
[TABLE]
Then for and large enough, the first term in the right-hand side of inequality (4) is equal to [math] and we have
[TABLE]
Third term of the right-hand side of (10).
Let us denote
[TABLE]
For any fixed configuration , analogous to Equation (12), we write
[TABLE]
where is the (random variable) number of nodes classified in group in the current (random) configuration , while they belong to group in (deterministic) configuration . Recall that is the number of nodes assigned to class by the configuration and let us denote the (random) proportion of vertices from class in attributed to class by . We write
[TABLE]
with .
Now extending these notations to the case where , we let where . We remark that the definition of implies that with the (random) subset of stochastic matrices defined for every by
[TABLE]
Let us also denote . Then
[TABLE]
We start by stating a concentration lemma on the random variable for any and any .
Lemma 5**.**
For any and any , let
[TABLE]
Then .
Building on the previous concentration lemma, the following one gives the convergence in -probability of the second term in the right-hand side of (15).
Lemma 6**.**
For any , any and any positive sequence,
[TABLE]
with .
Then taking any , for any , for any sequence increasing to infinity, we have the following inequality for and large enough
[TABLE]
implying that the probability in Lemma 6 converges to [math] as and increase for any , as long as . Now, for the first term in the right-hand side of (15), note that we have for every and every
[TABLE]
Then, either and
[TABLE]
or and
[TABLE]
In both cases, we get that for every and , thus obtaining the upper bound
[TABLE]
Letting
[TABLE]
and recalling that (for every ) for every , we have
[TABLE]
Finally, we bound the first term of the right-hand-side of (15) as follows
[TABLE]
Applying Markov’s Inequality, we obtain
[TABLE]
The following lemma gives an upper bound of the expectation appearing in the previous inequality, for any .
Lemma 7**.**
For any and any , we have the following inequality
[TABLE]
This leads to
[TABLE]
Then for any , for any sequence increasing to infinity, we have the convergence
[TABLE]
We proved the convergence to [math] of the three terms in the right-hand side of (10) for any sequence increasing to infinity and as long as . This gives the expected result and concludes the proof. ∎
5.2 Proof of Corollary 1
To prove this corollary, we establish the following lemma that allows us to obtain a rate of convergence of to from a rate of convergence of to . Note that this lemma is a bit more general than what we need and gives an equivalent result when the number of time steps is fixed, which will be useful for Corollary 2.
Lemma 8**.**
Let be any random functions on the set (resp. ) and (resp. ) defined as before. Assume that there exists a sequence (resp. ) a sequence decreasing to [math] such that for every , we have the following convergence as (resp. )
[TABLE]
[TABLE]
If for any and , (resp. ) is defined as the maximizer of on the set , (resp. ) we have the following convergence
[TABLE]
[TABLE]
with .
The result of Corollary 1 is then a direct consequence of Theorem 1 (choosing the sequence ) and Lemma 8 applied with . ∎
5.3 Proof of Theorem 2
The proof follows the lines of the proof of Theorem 3.8 in Celisse et al. (2012). Nonetheless, our result is sharper as we will establish an upper bound of the rate of convergence (in probability) of the quantity at stake. For any , any sequence and , we write
[TABLE]
with as defined in Lemma 5. We will establish that there exist some positive constants such that for any fixed configuration , any , any positive sequence such that and and large enough, we have
[TABLE]
Combined with (5.3) and applying Lemma 5, this gives the desired result. So now we focus on establishing (20).
In what follows, we consider a fixed configuration and introduce the Hamming distance between and any other configuration defined as
[TABLE]
We let denote the probability conditional on under parameter , i.e. . In the following, we will often use the fact that the variables are independent under (with mean value ) so that we can rely on Hoeffding’s Inequality. We introduce a sequence decreasing to 0 and the event defined as
[TABLE]
We bound the probability of interest in (20) by splitting it on the two complementary events and . For any and any positive sequence
[TABLE]
Thus, the proof of (20) boils down to establishing the desired upper bound on the second term appearing in the right-hand side of (21). We have
[TABLE]
by using the bound on the number of terms in the sum over (for each value of ). Then,
[TABLE]
as long as . For any configuration such that , we denote by the number of differences between the two configurations at each time step , i.e. such that . Moreover, for any parameter , we define the subset of indexes such that for which the parameter differs between the configuration and , namely
[TABLE]
with the set of indexes over which we sum to compute the conditional log-likelihood. In what follows, we abbreviate to (resp. ), the set (resp. ). Next lemma gives a decomposition of the main term at stake in (22).
Lemma 9**.**
We have the decomposition
[TABLE]
where
[TABLE]
Combining (22) and Lemma 9, we obtain
[TABLE]
We then decompose
[TABLE]
We handle these three terms separately in the following. From now on, we consider a configuration such that .
First term in the right-hand side of (5.3).
Recall that is given by (23). We can further decompose this term
[TABLE]
For and large enough such that (implying for the corresponding stationary distribution ), we have
[TABLE]
To handle the term , we need to lower bound the cardinality of the set . This is the purpose of Lemma 10 which is a generalization of Proposition B.4 in Celisse et al. (2012). This can be done for all the configurations and all the configurations that belong to some .
Lemma 10**.**
For any , any parameter , any configuration and any such that , we have
[TABLE]
Combining Lemma 10 with the previous bound, we get that
[TABLE]
We also have
[TABLE]
with for . The function is positive for every such that , hence, introducing the notation ,
[TABLE]
So, by (28), we have for large enough
[TABLE]
This leads to
[TABLE]
for any and large enough . Moreover, thanks to Hoeffding’s Inequality and Assumption 3,
[TABLE]
where is a constant depending on . Finally using Lemma 10, we have
[TABLE]
Second term in the right-hand side of (5.3).
We have
[TABLE]
For any , we introduce the sets
[TABLE]
Then we bound
[TABLE]
For every , we thus have
[TABLE]
We start by dealing with the first term of the right-hand side of (5.3). Notice that on the event , we have for every . The next lemma establishes that any set is included in a larger set, whose cardinality is bounded. In particular, the random set is included in a larger deterministic subset.
Lemma 11**.**
Let and denote two configurations such that . Then for any parameter , we have
[TABLE]
As the set is random (because is random), we write
[TABLE]
where now is a deterministic set. By a union bound and Hoeffding’s inequality, we have for any
[TABLE]
This leads to
[TABLE]
For the second term of (5.3), we get from a union bound and from Lemma 11 (that gives an upper bound for ) that
[TABLE]
because , implying that
[TABLE]
Finally, we have the following upper bound for the second term of (5.3)
[TABLE]
Third term in the right-hand side of (5.3).
We want to bound (in probability) the last term . Distinguishing between the cases where and , we have
[TABLE]
For any , we further introduce the sets
[TABLE]
Centering the (under the distribution ), we get
[TABLE]
Then, on the event and for and large enough such that and for every and , using the fact that for , we have
[TABLE]
Then, for every ,
[TABLE]
For the first term of (31), using Hoeffding’s inequality as before,
[TABLE]
For the second term of (31), we use
[TABLE]
Finally, we have the following upper bound for the third term of (5.3)
[TABLE]
Combining the 3 bounds on the right-hand-side of (5.3).
[TABLE]
Now we choose the sequence such that which is sufficient to imply that the quantities and vanish as and increase. For large enough values of and and with , and positive constants only depending on and , we then have
[TABLE]
Let us introduce
[TABLE]
Now we go back to (26). Noticing that the number of configurations such that is equal to , we have
[TABLE]
Finally, notice that as long as and (resp. as long as ), we have (resp. ) converges to 0. Then we obtain for some universal positive constant and large enough and
[TABLE]
This leads directly to inequality (20). ∎
5.4 Proof of Theorem 3
We fix some and study the convergence in probability of to with as defined by the fixed point equation (4), i.e.
[TABLE]
First, let us denote
[TABLE]
Then we can write the quantity at stake as
[TABLE]
to obtain the following upper bound on the probability of interest
[TABLE]
First term of the right-hand side of (33).
For the first term in (33), for any (implying for any ),
[TABLE]
First, we upper bound the probability for any , using the following lemma.
Lemma 12**.**
If , for any , for any sequence increasing to infinity such that and any , we have for any
[TABLE]
with a sequence decreasing to [math] such that .
Then, for the second term of (5.4), notice that and . We then have, if and , using Lemma 12 again,
[TABLE]
Finally, for the first term of (33), if is such that , if and as long as , we obtain
[TABLE]
Second term of the right-hand side of (33).
For the second term of (33), we split it on two complementary events as before. For any , we have
[TABLE]
We already gave an upper bound on the second term in the right-hand side of (36). Let us give one for the first term. Notice that as and if , we have by the mean value theorem
[TABLE]
We can then write for the first term in the right-hand side of (36), as long as , for such that and with such that , still using Lemma 12
[TABLE]
We finally obtain for the second term of the right-hand side of (33)
[TABLE]
We conclude the proof by summing the upper bounds obtained in (35) and (37)
[TABLE]
and by noticing that . ∎
5.5 Proof of Corollary 3
Denoting by the permutation minimizing the distance between (permuted) and for every , i.e. , we apply Theorem 3 to in order to get
[TABLE]
∎
5.6 Proof of Theorem 5
We use the following lemma, that states that the quantity we optimize in the VEM algorithm and the log-likelihood are asymptotically equivalent.
Lemma 13**.**
We have the following inequality -a.s.
[TABLE]
We have that for any , for and large enough,
[TABLE]
We then conclude by combining this result with Theorem 1. ∎
5.7 Proof of Corollary 5
This is a direct consequence of Theorem 5 and Lemma 8 applied with the functions . ∎
5.8 Proof of Theorem 6
This proof is quite similar to that of Theorem 3. We fix some and study the convergence in probability of to with as defined by the fixed point equation (5), i.e.
[TABLE]
First, let us denote
[TABLE]
Then we can write the quantity at stake as
[TABLE]
We follow the line of the proof of Theorem 3, using Lemma 14 below instead of Lemma 12 in order to obtain the result.
Lemma 14**.**
For any , for any sequence increasing to infinity such that and any , we have for any
[TABLE]
with a sequence decreasing to [math] such that .
∎
Acknowledgement
Work partly supported by the grant ANR-18-CE02-0010 of the French National Research Agency ANR (project EcoNet).
Appendix A Proofs of main results for the finite time case
A.1 Proof of Corollary 2
When the number of time steps is fixed and the connection probabilities vary over time, the conditional log-likelihood is
[TABLE]
and the likelihood is defined as in (2) with instead of . The maximum likelihood estimator is then
[TABLE]
As before, we denote the normalized log-likelihood . We introduce the following limiting quantity
[TABLE]
We follow the lines of the proof of Theorem 1 in order to prove that we have for any sequence , for all
[TABLE]
Choosing , we then use Lemma 8 to conclude that, as by assumption, for any ,
[TABLE]
In particular, for every , converges in -probability to up to label switching. Then, let us prove that on the event (whose probability converges to ), for large enough, the permutation minimizing the distance between and is the same for every . We consider large enough such that . Denoting by the permutations (depending on ) minimizing , we have that, for any , if some are such that , then
[TABLE]
and on the event we consider
[TABLE]
implying that . This means that on this event, the permutation minimizing the distance between and is the same for every . We can conclude that
[TABLE]
∎
A.2 Proof of Theorem 4
First, let us introduce some notations, as in the proof of Theorem 2. For any fixed configuration , we define for any configuration and any parameter
[TABLE]
and for any
[TABLE]
and as before, we abbreviate to (resp. ), the set (resp. ). We also introduce for any the quantities , , and as before, accordingly to this definition of . Finally, we introduce for any and the quantities
[TABLE]
Note that we can get an equivalent of Lemma 10 with a similar proof that gives that for any configuration in , for any configuration and any ,
[TABLE]
In the same way, we have an equivalent of Lemma 11 (with a similar proof) that gives that for any and two configurations at time such that and any parameter , we have
[TABLE]
Going back to the proof of Theorem 4, we follow the line of that of Theorem 2, with a few changes. We get the same decomposition as in equation (26), replacing by in the definitions of , and , and replacing the event by . For , the proof does not change. For , we write (instead of (5.3))
[TABLE]
For every , we thus have
[TABLE]
We start by dealing with the first term of (A.2). Notice that on the event , we have for every . As the set is random (because is random), we write for every , using (39),
[TABLE]
where now is a deterministic set. By a union bound and Hoeffding’s inequality, we have for any
[TABLE]
This leads to, for the first term of (A.2),
[TABLE]
For the second term of (A.2), we get from a union bound and from (39) that
[TABLE]
Finally, we have the following upper bound for
[TABLE]
For the third term , denoting , we have
[TABLE]
Then, we have on the event and for large enough such that and for every and , using the fact that for ,
[TABLE]
Then, for every ,
[TABLE]
For the first term of (41), using Hoeffding’s inequality as before,
[TABLE]
and for the second term of (41),
[TABLE]
Finally, we have the following upper bound for
[TABLE]
Now we choose the sequence such that which is sufficient to imply that the quantities and vanish as increases and we gather the three upper bounds. For large enough values of and with , , , and positive constants only depending on , , and , we then have
[TABLE]
Then, introducing
[TABLE]
we conclude as in the proof of Theorem 2, noticing that (resp. ) converges to 0 as increases as long as (resp. as long as ). ∎
A.3 Proof of Corollary 6
As in the proof of Theorem 5, using the convergence in Equation (38) and Lemma 13, we obtain for any
[TABLE]
We then conclude by using Lemma 8 applied with . ∎
Appendix B Proofs of technical lemmas
B.1 Proof of Lemma 1
As in the proof of Lemma E.2 from Celisse et al. [2012], we use the method of Lagrange multipliers to find the fixed-point equation of the critical point. Recall that and let us denote the likelihood and the conditional likelihood . Recall the definition of in (1) and that
[TABLE]
We compute the derivative of the Lagrangian with respect to each parameter .
[TABLE]
At the critical point , we obtain that for each we have
[TABLE]
where means ’proportional to’. The constraint gives the normalizing term and we obtain
[TABLE]
∎
B.2 Proof of Lemma 2
We can write the quantity to optimize
[TABLE]
Using this expression, we can obtain directly the expected fixed-point equation for the variational estimator of the transition probability from to . ∎
B.3 Proof of Lemma 3
We rely on the notation introduced in the proof of Theorem 1. For any , using classical dependency rules in directed acyclic graphs and the expression (9) of , we write
[TABLE]
and thus
[TABLE]
Using Bayes’ rule, we have
[TABLE]
Taking the expectation of this quantity with respect to any distribution on , we obtain
[TABLE]
where is a Kullback-Leibler divergence (thus non negative) and is the entropy of .
Taking now as the Dirac distribution located on , we have and
[TABLE]
Now, combining Inequalities (43) and (44), we obtain
[TABLE]
giving the expected result. ∎
B.4 Proof of Lemma 4
To prove this lemma, we first establish a control of the expectation of the random variable appearing in the statement.
Lemma 15**.**
We have the following inequality for and any configurations and any
[TABLE]
with .
We now turn to the proof of Lemma 4. Let us first recall Talagrand’s inequality [see for e.g. Massart, 2007, page 170, Equation (5.50)].
Theorem** (Talagrand’s inequality).**
Let denote independent and centered random variables. Define
[TABLE]
where . Let us further assume that there exist and such that for every and any and . Then, for every and , for any finite set of elements of , we have
[TABLE]
First, notice that and so that we have
[TABLE]
with . The set is finite, of size . Let us now apply Talagrand’s inequality to our setup. Note that for every , for any , we have
[TABLE]
almost surely thanks to Assumption 3, and with as defined in Lemma 15. Combining this result with Lemma 15 and writing , we have for any , for any , applying Talagrand’s inequality with and ,
[TABLE]
∎
B.5 Proof of Lemma 5
For any , Hoeffding’s inequality [see for example Theorem 2.8 from Boucheron et al., 2013] gives that
[TABLE]
which concludes the proof. ∎
B.6 Proof of Lemma 6
First notice that may not be unique, it is in fact a closed subset of . However, we choose a fixed element in this subset in the following. Letting and and using Lemma 5, we can split the probability as
[TABLE]
recalling that
[TABLE]
We thus want to bound the quantity on the event , which means bounding
[TABLE]
Let us denote for any matrix of size the norm . Then note that, for any matrix with coefficients in , for any , using Assumption 2 and 3,
[TABLE]
with . On the event we then have
[TABLE]
We then show that for any , for every and every , for any such that , there exists some such that , i.e. such that for every , . For every , we can construct as follows. On the event , for every , for any such that , we have for every . We then construct as follows and take for every .
- •
for choose as the closest integer to . It is in the interval so we have . Moreover, note that because .
- •
Repeat for
- –
if choose as the closest bigger (or equal) integer to .
- –
if choose as the closest smaller (or equal) integer to .
As before, is in the interval so we have . Moreover because . We also have (by induction)
[TABLE]
In the end, we have i.e. , meaning that , both and being integers. Then, if , there exists such that . This leads to
[TABLE]
which concludes the proof. ∎
B.7 Proof of Lemma 7
We can upper bound the expectation as follows
[TABLE]
We have for any
[TABLE]
This implies that
[TABLE]
and identically
[TABLE]
This leads to
[TABLE]
using the fact that for every . ∎
B.8 Proof of Lemma 8
We first consider the case when , and is constant over time. We use the following lemma.
Lemma 16**.**
For any , we have for small enough ()
[TABLE]
This gives an upper bound on the probability of interest
[TABLE]
By definition of , we write
[TABLE]
implying that
[TABLE]
We then obtain the following upper bound, that converges to [math] as and increase by assumption,
[TABLE]
When the number of time steps is fixed and is allowed to vary over time, the proof is almost the same. Indeed, means that there exists such that and we can apply Lemma 16 to this to obtain that . This implies that , which allows to conclude in the same way as before. ∎
B.9 Proof of Lemma 9
We have
[TABLE]
We decompose this sum as
[TABLE]
In the first sum of the right-hand side of (B.9), the terms are different from zero only for triplets in . Similarly in the last sum, the terms are different from zero for triplets in . As a consequence, we obtain
[TABLE]
We now write the last sum in the right-hand side as
[TABLE]
Distinguishing between the cases where and , we obtain
[TABLE]
In the end, we decompose
[TABLE]
which gives the result.
B.10 Proof of Lemma 10
We first notice that
[TABLE]
For every , we can apply Proposition B.4. from Celisse et al. [2012], as their Assumption (A4) is required to hold only for (see proof) and is valid on with the constant . We obtain
[TABLE]
We conclude by noticing that .
B.11 Proof of Lemma 11
The inclusion of the sets is straightforward. Now we have
[TABLE]
B.12 Proof of Lemma 12
First, let us decompose the quantity at stake as follows
[TABLE]
and upper bound the two terms in the right-hand side of (B.12). For the first one we will follow the proof of Theorem 3.9 from Celisse et al. [2012]. Let denote a fixed configuration. We work on the set and write
[TABLE]
Then
[TABLE]
where the last inequality comes from Theorem 2 where the bound is uniform with respect to .
Now, for the second term of (B.12), we use the following lemma.
Lemma 17**.**
There exist such that for any , for any sequence , we have, as long as ,
[TABLE]
We then combine the two upper bounds obtained in (49) and (50) in order to conclude, the assumption being satisfied for and large enough because . We obtain the expected result, using the fact that , that increases to infinity and that ,
[TABLE]
∎
B.13 Proof of Lemma 13
We have the following inequalities by definition of , and and because the Kullback-Leibler divergence is non-negative
[TABLE]
with . We write this Kullback-Leibler divergence (from to , with such that and ) as follows
[TABLE]
We then obtain
[TABLE]
Combined with (51), this leads to the following inequality for any parameter
[TABLE]
We can conclude that
[TABLE]
∎
B.14 Proof of Lemma 14
This proof is quite similar to that of Lemma 12. For any , let us write
[TABLE]
and upper bound the two probabilities in the right-hand side of this inequality. We already proved in Lemma 12 that the second term converges to [math] thanks to the assumptions on the sequence . For the first term, let denote a fixed configuration. Let us work on the set and use the same method as in the proof of Lemma 12,
[TABLE]
leading to
[TABLE]
Then we obtain
[TABLE]
For each , we use the following lemma.
Lemma 18**.**
Denoting , we have the following inequality for any configuration
[TABLE]
This gives us
[TABLE]
Noticing that the assumptions on imply that
[TABLE]
we can conclude by applying the result of Theorem 2 with the estimator for both terms of the right-hand side of (52). ∎
B.15 Proof of Lemma 15
The proof follows the lines of the proof of Lemma C.3. from Celisse et al. [2012]. Let denote the expectation given , i.e. . Introducing a ghost sample that is independent of and has the same distribution, we write
[TABLE]
where denotes the expectation with respect to under the true parameter and given . At this point, we notice that, if are independent Rademacher variables, then the random variables
[TABLE]
follow the same distribution, which implies that
[TABLE]
As a consequence, we have
[TABLE]
Then using Jensen’s inequality, Assumption 3 and the bound , we get
[TABLE]
where , concluding the proof. ∎
B.16 Proof of Lemma 16
We assume that . Without loss of generality, assume that the permutation (or one of the permutations) minimizing this distance is the identity. Let us write, using the fact that the identity matrix of size maximizes in (over the set of stochastic matrices) the quantity (see the proof of Theorem 3.6 in Celisse et al. [2012]) and denoting the coefficients of (thus depending on ),
[TABLE]
denoting the Kullback-Leibler divergence from a Bernoulli distribution with parameter to a Bernoulli distribution with parameter . For every , there exists such that because is a stochastic matrix. Using Assumption 2, we obtain
[TABLE]
thanks to a result on Kullback-Leibler divergence for Bernoulli distributions (see for instance Bubeck [2010], Chapter 10, Section 2, Lemma 10.3). We then want to show that there exist such that .
- •
If is a permutation, the assumption gives the expected result.
- •
If is not a permutation, it is not injective and there exist such that . Thanks to Assumption 1, take such that . Then
[TABLE]
leading to either or , using the fact that .
So, as there exist such that , we have
[TABLE]
∎
B.17 Proof of Lemma 17
For any node , the Markov chain is geometrically ergodic because its transition matrix satisfies Doeblin’s condition thanks to Assumption 2. For any , let us denote the Dirac mass at . There exists a positive constant and some such that and , we have
[TABLE]
where is the total variation norm. This leads to
[TABLE]
We now consider the Markov chain of the nodes evolving through time. Note that it is irreducible and aperiodic. Moreover, its transition matrix is given by , the -th Kronecker power of and its stationary distribution is . For any , let us denote . For every , we can decompose
[TABLE]
We use
[TABLE]
So, reorganizing the terms, we write
[TABLE]
Let us recall the definition of an -mixing time. For any Markov transition matrix over the set with stationary distribution , for any , the -mixing time of the Markov chain is defined as
[TABLE]
Denoting by the -mixing time of the Markov chain , we thus obtain
[TABLE]
Now, we introduce a new Markov chain , that is defined by
[TABLE]
Notice that it is irreducible and aperiodic, with stationary distribution defined for every state by
[TABLE]
It is easily seen that for any , its -mixing time equals . We apply Theorem 3 from Chung et al. [2012], for any , considering the weight function for every (of expectation under the stationary distribution). Then , and denoting , we obtain that there exist such that for any , as long as
[TABLE]
∎
B.18 Proof of Lemma 18
For any configuration ,
[TABLE]
the third inequality being true because by definition minimizes over the set of variational distributions. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bartolucci et al. [2018] F. Bartolucci, M. F. Marino, and S. Pandolfi. Dealing with reciprocity in dynamic stochastic block models. Comput. Stat. Data Anal. , 123(C):86–100, 2018.
- 2Becker and Holzmann [2018] A.-K. Becker and H. Holzmann. Nonparametric identification in the dynamic stochastic block model. ar Xiv e-prints , page ar Xiv:1811.00934, Nov. 2018.
- 3Bickel et al. [2013] P. Bickel, D. Choi, X. Chang, and H. Zhang. Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist. , 41(4):1922–1943, 08 2013.
- 4Boucheron et al. [2013] S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence . OUP Oxford, 2013.
- 5Bubeck [2010] S. Bubeck. Jeux de bandits et fondations du clustering . Ph D thesis, Université Lille 1, 2010.
- 6Celisse et al. [2012] A. Celisse, J.-J. Daudin, and L. Pierre. Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Statist. , 6:1847–1899, 2012.
- 7Chung et al. [2012] K.-M. Chung, H. Lam, Z. Liu, and M. Mitzenmacher. Chernoff-Hoeffding bounds for Markov chains: generalized and simplified. In C. Dürr and T. Wilke, editors, 29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012) , volume 14 of Leibniz International Proceedings in Informatics (LIP Ics) , pages 124–135, Dagstuhl, Germany, 2012. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
- 8Daudin et al. [2008] J.-J. Daudin, F. Picard, and S. Robin. A mixture model for random graphs. Statistics and Computing , 18(2):173–183, Jun 2008.
