A mixture model approach for clustering bipartite networks

Isabella Gollini

arXiv:1905.02659·stat.AP·July 19, 2019

A mixture model approach for clustering bipartite networks

Isabella Gollini

PDF

Open Access

TL;DR

This paper introduces a flexible mixture model for clustering bipartite networks, capturing latent groups and individual propensities, and demonstrates its application on terrorist network data to identify key groups and traits.

Contribution

It presents a novel mixture model approach for bipartite network clustering that accounts for both group structure and individual variability, estimated efficiently via variational inference.

Findings

01

Identified main latent groups of terrorists.

02

Estimated latent trait scores for individuals.

03

Demonstrated model's effectiveness on real terrorist network data.

Abstract

This chapter investigates the latent structure of bipartite networks via a model-based clustering approach which is able to capture both latent groups of sending nodes and latent variability of the propensity of sending nodes to create links with receiving nodes within each group. This modelling approach is very flexible and can be estimated by using fast inferential approaches such as variational inference. We apply this model to the analysis of a terrorist network in order to identify the main latent groups of terrorists and their latent trait scores based on their attendance to some events.

Tables1

Table 1. Table 1: BIC results for standard and constrained MLTA models with different number of groups and dimensions.

	$D = 0$	$D = 1$		$D = 2$		$D = 3$
			common $𝐰_{r}$		common $𝐰_{r}$		common $𝐰_{r}$
$G = 2$	2062	2138	2034	2389	2096	2793	2311
$G = 3$	2157	2403	2115	2876	2229	3417	2434
$G = 4$	2290	2730	2249	3385	2385	4419	2595

Equations18

Y_{n r} = {1, 0, n \sim r; n \neq \sim r .

Y_{n r} = {1, 0, n \sim r; n \neq \sim r .

z_{n} \sim \mbox M u l t in o mia l (1, (η_{1}, η_{2}, \dots, η_{G}))

z_{n} \sim \mbox M u l t in o mia l (1, (η_{1}, η_{2}, \dots, η_{G}))

p (y) = n = 1 \prod N g = 1 \sum G η_{g} p (y_{n 1}, \dots, y_{n R} ∣ z_{n g} = 1) = n = 1 \prod N g = 1 \sum G η_{g} \int p (y_{n 1}, \dots, y_{n R} ∣ θ_{n}, z_{n g} = 1) p (θ_{n}) d θ_{n}

p (y) = n = 1 \prod N g = 1 \sum G η_{g} p (y_{n 1}, \dots, y_{n R} ∣ z_{n g} = 1) = n = 1 \prod N g = 1 \sum G η_{g} \int p (y_{n 1}, \dots, y_{n R} ∣ θ_{n}, z_{n g} = 1) p (θ_{n}) d θ_{n}

p (y_{n 1}, \dots, y_{n R} ∣ θ_{n}, z_{n g} = 1) = r = 1 \prod R p (y_{n r} ∣ θ_{n}, z_{n g} = 1) = r = 1 \prod R (π_{r g} (θ_{n}))^{y_{n r}} (1 - π_{r g} (θ_{n}))^{1 - y_{n r}},

p (y_{n 1}, \dots, y_{n R} ∣ θ_{n}, z_{n g} = 1) = r = 1 \prod R p (y_{n r} ∣ θ_{n}, z_{n g} = 1) = r = 1 \prod R (π_{r g} (θ_{n}))^{y_{n r}} (1 - π_{r g} (θ_{n}))^{1 - y_{n r}},

π_{r g} (θ_{n}) = p (x_{n r} = 1∣ θ_{n}, z_{n g} = 1) = \frac{1}{1 + exp [ - ( b _{r g} + w _{r g}^{T} θ _{n} ) ]}, 0 \leq π_{g r} (θ_{n}) \leq 1.

π_{r g} (θ_{n}) = p (x_{n r} = 1∣ θ_{n}, z_{n g} = 1) = \frac{1}{1 + exp [ - ( b _{r g} + w _{r g}^{T} θ _{n} ) ]}, 0 \leq π_{g r} (θ_{n}) \leq 1.

π_{r g} (θ_{n}) = \frac{1}{1 + exp [ - ( b _{r g} + w _{r}^{T} θ _{n} ) ]}, 0 \leq π_{g r} (θ_{n}) \leq 1,

π_{r g} (θ_{n}) = \frac{1}{1 + exp [ - ( b _{r g} + w _{r}^{T} θ _{n} ) ]}, 0 \leq π_{g r} (θ_{n}) \leq 1,

BIC = - 2 ℓ_{GH} + k lo g (N),

BIC = - 2 ℓ_{GH} + k lo g (N),

lo g {lift} = lo g {\frac{P ( y _{n r} = 1 , y _{nk} = 1∣ z _{n g} = 1 )}{P ( y _{n r} = 1∣ z _{n g} = 1 ) P ( y _{nk} = 1∣ z _{n g} = 1 )}}

lo g {lift} = lo g {\frac{P ( y _{n r} = 1 , y _{nk} = 1∣ z _{n g} = 1 )}{P ( y _{n r} = 1∣ z _{n g} = 1 ) P ( y _{nk} = 1∣ z _{n g} = 1 )}}

π_{r g} (0) = p (x_{n r} = 1∣ θ_{n} = 0, z_{n g} = 1) = \frac{1}{1 + exp ( - b _{r g} )} .

π_{r g} (0) = p (x_{n r} = 1∣ θ_{n} = 0, z_{n g} = 1) = \frac{1}{1 + exp ( - b _{r g} )} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Terrorism, Counterterrorism, and Political Violence · Census and Population Estimation

Full text

11institutetext: Isabella Gollini 22institutetext: University College Dublin, Belfield, Dublin, Ireland, 22email: [email protected]

A mixture model approach for clustering bipartite networks

Isabella Gollini

Abstract

This chapter investigates the latent structure of bipartite networks via a model-based clustering approach which is able to capture both latent groups of sending nodes and latent variability of the propensity of sending nodes to create links with receiving nodes within each group. This modelling approach is very flexible and can be estimated by using fast inferential approaches such as variational inference. We apply this model to the analysis of a terrorist network in order to identify the main latent groups of terrorists and their latent trait scores based on their attendance to some events.

1 Introduction

In recent years, there has been a growing interest in the analysis of network data. Network models have been successfully applied to many different research areas. We refer to tow:whi:gol:mur12 (1) for a general overview of the statistical models and methods for networks.

In this chapter we will focus on finding clusters in a particular class of networks that is called bipartite networks. Bipartite networks consist of nodes belonging to two disjoint and independent sets, called sending and receiving nodes, such that every edge can only connect a sending node (e.g., actor) to a receiving node (e.g., event).

Latent variable models have been used to model the unobserved group structure of bipartite networks by setting sending nodes as observations and receiving nodes as observed variables (see for example aitkin2014statistical (2, 3)). One important issue limiting the use of classical latent variable approaches, such as latent class analysis aitkin2017statistical (4, 5) and stochastic blockmodels now:sni01 (6), is the assumption of local independence within the groups, that, in presence of a large heterogeneous network, may tend to yield an overestimated number of groups making the results more difficult to interpret and potentially misleading. Aitkin et al. aitkin2017statistical (4) proposed to use different models to overcome the issue of the local dependence assumption including the random Rasch latent class model in which they made use of class and event specific parameters, that are, however, not able to capture the within class behaviour of each actor. Furthermore the computational effort required to estimate the model they propose is significant and this issue makes inference infeasible for large networks.

This chapter concerns the identification of groups in bipartite networks consisting of a set of actors and a set of events through a statistical mixture modelling approach which assumes the existence of a latent trait describing the dependence structure between events within actor groups and therefore capturing the heterogeneity of actors’ behaviour within groups. This modelling framework allows for: model selection procedures for estimating the number of groups; explanation of the dependence structure of events in each group; description of the behaviour of each actor within each group by quantifying, through the latent trait, the conditional probability that a certain actor belonging to a certain group will attend a certain event. The posterior estimate of the latent trait scores can be visualised so as to interpret the estimated latent traits within each group. In order to fit the model variational inferential approaches are applied (see Tip99 (7) and gol:mur14 (8) for a comparison of estimates given by the variational and other approaches in latent trait models). The code implemented is included in the lvm4net package lvm4net (9) for R R (10). The rest of this chapter is organised as follow: in Section 2 we describe the model and the inferential approach. In Section 3 we apply the proposed methodology to the Noordin Top terrorist bipartite network aitkin2017statistical (4) in which we will aim to identify clusters of terrorists based on their attendance to a series of events in Indonesia from 2001 and 2010. We conclude in Section 4 with some final remarks.

2 Model-based Clustering for Bipartite Networks

The relational structure of a bipartite network graph can be described by a random incidence matrix $\mathbf{Y}$ on $N$ sending nodes (i.e. actors), $R$ receiving nodes (i.e. events) and a set of edges $\{Y_{nr}:n=1,\dots,N;r=1,\dots,R\}$ , where:

[TABLE]

To cluster bipartite networks we adapt a flexible model-based clustering approach for categorical data, the mixture of latent trait analyzers (MLTA) model introduced by gol:mur14 (8), to the context of bipartite network data. The MLTA model is a mixture model for binary data where observations are not necessarily conditionally independent given the group memberships. In fact, the observations within groups are modelled using a latent trait analysis model and thus dependence is accommodated. The MLTA model generalizes the latent class analysis and latent trait analysis by assuming that a set of $N$ sending nodes can be partitioned into $G$ groups, and the propensity of each actor to create links to the $R$ receiving nodes depends on both the group they belong to and the presence of a $D$ dimensional continuous latent variable $\boldsymbol{\theta}_{n}$ .

The model assumes that each sending node comes from one of $G$ unobserved groups and defines $\mathbf{z}_{n}=(z_{n1},z_{n2},\ldots,z_{nG})$ as an indicator of the group membership, $z_{ng}=1$ if actor $n$ is from group $g$ , with the following distribution:

[TABLE]

where $\eta_{g}$ is the prior probability of a randomly chosen observation coming from group $g$ ( $\sum_{g^{\prime}=1}^{G}\eta_{g^{\prime}}=1$ and $\eta_{g}\geq 0$ $\forall\,g=1,\ldots,G$ ). Further, the conditional distribution of $y_{n1},\ldots,y_{nR}$ given that the observation is from group $g$ is assumed to be a latent trait model with parameters $b_{rg}$ and $\mathbf{w}_{rg}$ .

Thus, the likelihood of the MLTA model is defined as,

[TABLE]

where the conditional distribution of given $\boldsymbol{\theta}_{n}$ and $z_{ng}=1$ is a Bernoulli distribution:

[TABLE]

and the response function for each group $\pi_{rg}(\boldsymbol{\theta}_{n})$ is defined as the following logistic function:

[TABLE]

In addition, it is assumed that the $D$ -dimensional latent variable $\boldsymbol{\theta}_{n}\sim\mathcal{N}(\mathbf{0},\mathbf{I})$ .

The attractiveness of receiving node $r$ for sending nodes belonging to group $g$ is modelled by the parameter $b_{rg}$ . The parameter $\mathbf{w}_{rg}$ measures the heterogeneity of the behaviour of sending nodes belonging to group $g$ to connect to the receiving node $r$ (i.e., the heterogeneity of terrorists belonging to the latent group $g$ in attending event $r$ ); it also accounts for the dependence between receiving nodes. The vector $\boldsymbol{\theta}_{n}$ contains the latent variables explaining the propensity of forming links for sending node $n$ , i.e., the propensity of terrorist $n$ to attend the events.

We also use a constrained model with common variable-specific slope parameters across groups (i.e. $\mathbf{w}_{rg}=\mathbf{w}_{rg^{\prime}}=\mathbf{w}_{r}$ , where $g\neq g^{\prime}$ ):

[TABLE]

This model is particularly useful to avoid the estimation of too many parameters, especially when the data set is complex, with actors coming from several latent groups and the continuous latent variable having high dimensionality.

The likelihood of the MLTA model is computationally intractable. For this reason gol:mur14 (8) proposed to use a double EM algorithm with variational approximation of the likelihood to fit this model, also guaranteeing fast convergence. The main aim of this variational approach is to maximize the Jaakkola & Jordan JJ96 (11) lower bound of the likelihood function. This lower bound is a function of auxiliary parameters, called variational parameters, that are optimised to tighten this lower bound. The standard errors of the model parameters can be calculated using the jackknife method Efr81 (12). For full details of the double EM algorithm we refer to gol:mur14 (8).

Since the EM approach is adopted, there is the issue that the results may be affected by the risk of converging to a local maximum instead of the global maximum approximate likelihood. For this reason, it is generally advisable to run the algorithm several times using different initializing values, and select the solution with maximum approximate likelihood. The application of the variational approach makes the estimation procedure much more efficient than most of classical simulation-based estimation methods even when multiple starts are employed.

However, the approximation of the log-likelihood obtained by using the variational approach with the Jaakkola & Jordan lower bound is always less or equal than the true log-likelihood, so before performing model selection based on the likelihood, like the Bayesian Information Criterion ( $\mathrm{BIC}$ ) Sch78 (13), it may be advantageous to get a more accurate estimate the log-likelihood at the last step of the algorithm using Gauss-Hermite quadrature gol:mur14 (8).

3 Noordin Top Terrorist Network

The Noordin top terrorist network data everton2012disrupting (14) displayed in Figure 1 is a bipartite network oriented around the Malaysian Muslim extremist Noordin Mohammad Top (ID: 54) and his collaborators (the dataset is available in the manet package manet (15) for R). The data include relational information on $N=79$ sending nodes that are individuals belonging to terrorist/insurgent organizations and on $R=45$ receiving nodes that represent events in Indonesia and nearby areas from 2001 to 2010. The incidence matrix contains links encoding the attendance behaviour of the terrorists to the events.

3.1 Statistical Analysis

We apply the MLTA modelling approach to the Noordin Top Terrorist Network. To avoid the issue of getting estimates affected by convergence to a local maximum, we use ten random starts of the algorithm and only the estimates corresponding to the maximum likelihood value are selected. The model parameters $b_{rg}$ and $\mathbf{w}_{rg}$ are initialized by random generated numbers from a $\mathcal{N}(\mathbf{0},\mathbf{I})$ and the variational parameters are initialized to be equal to 20 in order to reduce the dependence of the final estimates on the initializing values.

The model is fitted on a range of groups, from 2 to 4 and the continuous latent variable takes value $D$ from 0 to 3. For $D=0$ the MLTA model reduces to a latent class analysis where the observations are assumed to be conditionally independent given the group membership. Model selection is performed on both the unconstrained MLTA and the constrained model with common slope.

The Bayesian Information Criterion ( $\mathrm{BIC}$ ) Sch78 (13) is used to select the best model, and it is defined as:

[TABLE]

where $\ell_{\mathrm{GH}}$ is the estimate the log-likelihood at the last step of the algorithm obtained by using Gauss-Hermite quadrature, $k$ is the number of free parameters in the model and $N$ is the number of sending nodes. The model with the lower value of $\mathrm{BIC}$ is preferable.

Table 1 shows the BIC values for models with increasing dimensionality. The best model selected is the one with two groups, a one-dimensional latent trait and common slope across groups.

For the best model selected, the values of the mixing proportions are: $\eta_{1}=0.57$ (SE = 0.080) for Group 1, and $\eta_{2}=0.43$ (SE = 0.084) for Group 2.

3.2 Interpreting the Actor’s Behaviour

The sending nodes are partitioned into the two groups according to their maximum a posteriori (MAP) probability that they belong to each group. Figure 2 shows the posterior probability of each actor to belong to each group.

Most of the terrorists have been assigned to a particular group with probability very close to 1. In particular, Noordin Top (ID 54), attending 23 events, and Azhari Husin (ID 21), attending 17 events, are allocated together into Group 1 with probability 1. The ‘lone wolves’ (IDs 75, 76, 77, 78, 79), i.e., terrorists who haven’t attended any event, have been assigned to Group 1, but the uncertainty associated to their group membership is very large: in fact, their posterior probability to belong to Group 1 is 0.6.

In order to have a deeper understanding of group memberships we can use the information provided by the posterior distribution of the latent trait score $\theta_{n}$ conditional on the observation belonging to a particular group which can be obtained from the model estimates (see Figure 3).

The posterior mean estimates of these $\theta_{n}$ together with the information about event attendance $y_{nr}$ can be used to interpret the latent variables within each group: Figure 4 allows us to notice that in Group 1 the terrorists with high values went to events 7, 14, and 22 and most of the terrorists with low values went to events 13, 26, 34, 42. In Group 2 positive values are assigned to those terrorists who attended event 1 (it is also possible to notice that none of the terrorist in Group 1 attended event 1), negative values of the latent trait are associated with terrorists who attended events 2, 9, 25, and 33.

3.3 Interpreting the Events Attendance

A measure of the heterogeneity of attending event $r$ within group $g$ is given by the slope value $\mathbf{w}_{rg}$ ; the larger the value of $\mathbf{w}_{rg}$ the greater the differences in the probabilities of sending a link (going to event) $r$ for actors from group $g$ .

The choice of a model with the common slope ( $w_{r1}=w_{r2}=w_{r}$ ) in all groups suggests the latent trait has the same effect in all groups. From Figure 5 it is possible to notice that most of the slope parameters are non-zero, meaning that the latent trait introduces significant variation within the groups. This indicates that there is considerable variability within the event attendance in the two groups and that some events are positively dependent (i.e., those going/not going to one event will tend to be going/not going in the others events) and other are negatively dependent. The dependence between events $r$ and $k$ in group $g$ is given by $\mathbf{w}_{rg}^{T}\mathbf{w}_{kg}$ , and the results are shown in Figure 6. Red (blue) squares in the heatmap mean positive (negative) dependence, the darker they are the higher is the dependence between two events. Figure 6 shows that the two set of events $(1,6,7)$ and $(3,33,36)$ are positively dependent within them and negatively between them.

The heatmap displayed in Figure 7 represents the values of the $\log\{\mathit{lift}\}$ BMUT97 (16) that can be used to quantify within each group the effect of the dependence on the probability of attending two events compared to the probability of attending two events under an independence model. Mathematically the $\log\{\mathit{lift}\}$ for events $r$ and $k$ for actors belonging to group $g$ is defined as,

[TABLE]

where $r=1,2,\ldots,R$ and $r\neq k$ . Two independent events have $\log\{\mathit{lift}\}=0$ : the more two events are positively dependent, the higher the value of the $\log\{\mathit{lift}\}$ . Lift values that are much less than 0 provide evidence of negative dependence within groups. Figure 7 shows that in Group 1 there is high negative dependence between events 1 and events 3, 33, 36, and in Group 2 there is high positive dependence between the events 27, 32, 34, 35, 45.

The attractiveness of event $r$ for actors belonging to group $g$ is modelled by $b_{rg}$ . Figure 8 shows that most of the values are significantly negative highlighting the sparse structure of the network.

Since $\theta_{n}\sim\mathcal{N}(0,1)$ , the probability that the median individual in group $g$ attends events $r$ can be calculated from the attractiveness parameters through the relationship:

[TABLE]

From Figure 9 it is evident the different behaviour of the actors belonging to the two groups. Actors in Group 1 have high probability to attend events 7, 13, 14, 43, 44, 45, while those in Group 2 have very low probability to attend these events $(<10^{-04})$ , and none of the terrorists in the data set assigned to Group 2 actually attended those events. Similarly Actors in Group 2 have high probability to attend events 1, 3, 15, 16, 17, 21, while those in Group 1 have very low probability to attend these events $(<10^{-04})$ , and none of the terrorists in the data set assigned to Group 1 actually attended those events. Overall the probability that the median actor in each group attends any event is quite low (the highest probability of $0.438$ for event 9 in Group 2). This is due to the fact that the number of terrorists attending the same events ranges from a minimum of 3 up to a maximum of 18 out of the total of 79 terrorists.

The proposed methodology is implemented in the lvm4net package for R.

4 Conclusions

In this chapter, we have presented an application of a finite mixture model to the clustering of bipartite network data. The modelling framework is particularly flexible and useful for describing the between-group structure using a discrete latent variable and the within-group structure using a continuous latent variable. We have also illustrated how a variational inferential approach can be adopted to estimate the model efficiently. The model has been employed to analyse the relational connectivity patterns of the Noordin Top terrorist network. This has allowed us to find two main groups of terrorists based on their attendance to some events and yield important insights about both terrorists’ behaviour within each group and the amount of dependence between events attended by them.

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Salter-Townshend M, White A, Gollini I, Murphy TB (2012) Review of statistical network analysis: models, algorithms, and software. Statistical Analysis and Data Mining 5(4):243–264
2(2) Aitkin M, Vu D, Francis B (2014) Statistical modelling of the group structure of social networks. Social Networks 38:74–87
3(3) Ranciati S, Vinciotti V, Wit EC (2017) Identifying overlapping terrorist cells from the noordin top actor-event network. ar Xiv preprint ar Xiv:171010319
4(4) Aitkin M, Vu D, Francis B (2017) Statistical modelling of a terrorist network. Journal of the Royal Statistical Society: Series A (Statistics in Society) 180(3):751–768
5(5) Bartholomew DJ, Knott M, Moustaki I (2011) Latent Variable Models and Factor Analysis: A Unified Approach, 3rd edn. Wiley
6(6) Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association 96:1077–1087
7(7) Tipping ME (1999) Probabilistic visualisation of high-dimensional binary data. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems 11, MIT Press, Cambridge, MA, USA, pp 592–598
8(8) Gollini I, Murphy TB (2014) Mixture of latent trait analyzers for model-based clustering of categorical data. Statistics and Computing 24(4):569–588