Clustered Gaussian Graphical Model via Symmetric Convex Clustering
Tianyi Yao, Genevera I. Allen

TL;DR
This paper introduces a new convex clustering method for Gaussian graphical models to identify functional neuron groups from activity data, with proven convergence and demonstrated effectiveness.
Contribution
It proposes a novel symmetric convex clustering penalty within a convex optimization framework for neural data analysis.
Findings
Effective clustering on synthetic data
Successful application to real neuroscientific data
Convergence guarantees for the ADMM algorithm
Abstract
Knowledge of functional groupings of neurons can shed light on structures of neural circuits and is valuable in many types of neuroimaging studies. However, accurately determining which neurons carry out similar neurological tasks via controlled experiments is both labor-intensive and prohibitively expensive on a large scale. Thus, it is of great interest to cluster neurons that have similar connectivity profiles into functionally coherent groups in a data-driven manner. In this work, we propose the clustered Gaussian graphical model (GGM) and a novel symmetric convex clustering penalty in an unified convex optimization framework for inferring functional clusters among neurons from neural activity data. A parallelizable multi-block Alternating Direction Method of Multipliers (ADMM) algorithm is used to solve the corresponding convex optimization problem. In addition, we establish…
Click any figure to enlarge with its caption.
Figure 1| Dataset | Method | Rand Index |
|---|---|---|
| Scenario I | Clustered GGM | 0.964 (0.068) |
| k-means | 0.505 (0.003) | |
| HC Euclidean Ward | 0.498 (0.007) | |
| SC empirical corr | 0.511 (0.006) | |
| HC empirical corr Ward | 0.521 (0.027) | |
| GLasso Louvain | 0.556 (0.022) | |
| Scenario II | Clustered GGM | 0.999 (0.003) |
| k-means | 0.515 (0.01) | |
| HC Euclidean Ward | 0.791 (0.003) | |
| SC empirical corr | 0.526 (0.002) | |
| HC empirical corr Ward | 0.513 (0.023) | |
| GLasso Louvain | 0.566 (0.003) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Remote-Sensing Image Classification · Gene expression and cancer classification
MethodsAlternating Direction Method of Multipliers
Clustered Gaussian Graphical Model via Symmetric Convex Clustering
Abstract
Knowledge of functional groupings of neurons can shed light on structures of neural circuits and is valuable in many types of neuroimaging studies. However, accurately determining which neurons carry out similar neurological tasks via controlled experiments is both labor-intensive and prohibitively expensive on a large scale. Thus, it is of great interest to cluster neurons that have similar connectivity profiles into functionally coherent groups in a data-driven manner. In this work, we propose the clustered Gaussian graphical model (GGM) and a novel symmetric convex clustering penalty in an unified convex optimization framework for inferring functional clusters among neurons from neural activity data. A parallelizable multi-block Alternating Direction Method of Multipliers (ADMM) algorithm is used to solve the corresponding convex optimization problem. In addition, we establish convergence guarantees for the proposed ADMM algorithm. Experimental results on both synthetic data and real-world neuroscientific data demonstrate the effectiveness of our approach.
**Index Terms— ** Gaussian graphical model, Convex clustering, ADMM, Computational neuroscience
1 INTRODUCTION
In neuroscience, an important goal is to identify which neurons are involved in similar computations and how they are organized into functionally coherent units to carry out specific computational tasks in the brain. Such knowledge of functional organizations of neurons could lead to a better understanding of structures of interconnected neural circuits and thus the operating mechanisms of the brain. Advancement of optical imaging technologies such as calcium imaging has enabled indirect recordings of spiking activity from thousands of neurons simultaneously [1, 2]. Learning the functional organizations of large neuronal populations from such high-dimensional neural activity recording data is a major challenge in computational neuroscience.
Functional connectivity, which is defined as statistical dependence among measurements of neuronal activity in [3], has been widely used to describe functional interactions among measured neuronal populations. Because functional connectivity is not directly observable, numerous techniques such as correlations and partial correlations have been proposed to estimate such functional connectivity from neural recording data (see [3] for a comprehensive review). In this work, we define functional connectivity between each pair of recorded neurons to be their pairwise partial correlation or edges in an undirected GGM in high dimensions. Because the pairwise partial correlation between two neurons takes activities of all the other recorded neurons into account, it captures only direct associations between neurons and discard all indirect associations [3, 4], which makes pairwise partial correlation coefficient a better indicator of functional connectivity than Pearson correlation. Furthermore, because pairwise partial correlation is the same as the corresponding off-diagonal entries of the standardized precision matrix, the functional connectivity graph of all recorded neurons can be represented by the standardized precision matrix or the corresponding undirected GGM [5].
While there is no standardized definition for functional cluster, many neuroscientific studies have found that each neuronal type has its own distinct input-output connectivity patterns [6] and neurons with similar connectivity patterns typically have similar neurological roles and functions [7]. Therefore, in this work, we seek to define functional clusters to be groups of neurons that share functional connectivity patterns. Hence, inferring functionally coherent groups of neurons is equivalent to clustering neurons with similar functional connectivity patterns.
While many techniques have been proposed for uncovering clusters from multivariate data (see [8] for a comprehensive review) as well as for finding community structures in network data (see [9] for a comprehensive review), they are somewhat limited in this application for various reasons. First of all, distance-based clustering techniques such as k-means and hierarchical clustering on pairwise Euclidean distances cluster variables based on the first-moment of the distribution, whereas functional clusters are defined by functional connectivity patterns, which are characterized by the second-moment of the distribution. Some studies in fMRI have applied hierarchical clustering on empirical partial correlation based dissimilarity matrix to cluster brain regions [10]. However, such approaches are not applicable to high-dimensional neural activity data because the MLE of partial correlation matrix does not exist due to singularity of the empirical covariance matrix. Others have taken a two-step approach where a functional connectivity graph is first estimated and then community detection algorithms are used subsequently to infer clusters [11, 12]. Yet such two-step approaches are highly sensitive to noise as a single erroneously estimated functional connection in the first step could adversely impact the clustering results of the community detection algorithms. Last but not least, some studies have proposed nonparametric Bayesian approaches for estimating the block structures of GGM and clustering variables using a MCMC sampling method [13]. However, such MCMC-based approaches can easily become computationally infeasible on moderate-sized graphs.
In this paper, we make several methodological contributions: (1) We propose the clustered GGM that involves a novel symmetric convex clustering penalty, which allows us to exploit the symmetric structures of a functional connectivity graph for better estimation of functional clusters. (2) We provide a tractable ADMM algorithm with convergence guarantees to fit our clustered GGM method in big-data settings. Because of these contributions, our clustered GGM method enjoys many advantages over existing approaches to inferring functional clusters from neural activity data: (i) With our novel symmetric convex clustering penalty, our method explicitly leverages functional connectivity patterns to cluster neurons into functionally coherent groups. (ii) Because the clustered GGM is formulated in an unified convex optimization framework, our single-step method is more stable and conducive to data-driven model selection.
2 The Clustered GGM
2.1 Model Setup and Background
Suppose the neural activity recordings of neurons over time points are arranged into the data matrix and the recording of all neurons at the th time point, , is a random -vector independently drawn from the same time-invariant -variate Gaussian distribution [4]. We can approximately achieve the assumption of independence by prewhitening the raw time series using appropriate time series models. As noted before, the functional connectivity graph can be represented by the standardized precision matrix , where . Hence, estimating functional clusters based on functional connectivity patterns is equivalent to recovering the group structures that form checkerboard patterns in .
2.2 The Symmetric Convex Clustering Penalty
At first glance, designing a penalty function to encourage checkerboard patterns in the estimate of seems straightforward as one might ask whether we can simply apply the convex biclustering fusion penalty [14] to simultaneously force rows and columns of to coalesce to form block structures. However, simply applying such biclustering penalty does not guarantee the same amount of fusion along the rows and columns of and it can easily result in different estimated functional clusters between rows and columns. In fact, any fusion penalty that directly regularizes elements of would lead to asymmetric estimates, thus leading to difficult interpretations. Also recognized by [13], designing a penalty function to force such checkerboard patterns in a GGM in a computationally feasible way is indeed a challenging task.
Our objective is to develop a convex penalty function that allows us to explicitly model functional clusters among neurons based on mutual pairwise functional connectivity patterns and preserve the symmetry of estimated functional connectivity graph as well as neuron cluster assignments. To this end, we build upon the convex fusion penalty [15, 16] and introduce a novel symmetric convex clustering penalty that encourages symmetric checkerboard patterns in the estimated precision matrix.
Consider a symmetric matrix , the symmetric convex clustering penalty function takes the form
[TABLE]
Here, we index a neuron pair by with and define the fusion set over the non-zero fusion weights . The set of all nonnegative, pairwise fusion weights can be specified beforehand to incorporate domain knowledge and take auxiliary information (e.g. interneuron distances) into account. Additionally, (and ) denotes the st (and nd) column of , which can be interpreted as cluster centroid matrix corresponding to the th neuron pair. For , the rows of consist of canonical basis vectors for and the columns of consist of canonical basis vectors for .
We now discuss the intuition behind the symmetric convex clustering penalty . For any neuron pair in the fusion set , the canonical basis matrices and extracts a portion of such that the st (and nd) column of represents the functional connectivity patterns of neuron (and neuron ) with all the other recorded neurons. is taken to be a copy of and the fusion penalty term induces sparsity in the difference between neuron and ’s respective functional connectivity patterns with all the other recorded neurons, thus encouraging the estimates of and to fuse. Neuron and are assigned to the same functional cluster if , which means neuron and have the same conditional relationships with all the other recorded neurons. All such fusions can be done separately and in parallel for each neuron pair , and the set of equality constraints aggregates all fusion results back to to form symmetric checkerboard patterns denoting functional clusters among neurons.
2.3 The Clustered GGM via Symmetric Convex Clustering
While one can apply the symmetric convex clustering penalty to any loss functions that take a symmetric matrix as input, we specifically apply to the negative log-likelihood of the multivariate Gaussian distribution to yield the clustered GGM problem.
[TABLE]
where denotes the set of positive definite matrices of size and denotes the empirical covariance matrix (assuming the columns of are properly centered). Unlike the GLasso problem [17], our clustered GGM does not aim to produce a sparse graph estimate. Instead, our clustered GGM leads to a graph estimate with block structures that indicate cluster assignments of nodes. In addition, fusing many elements of to the same values, the symmetric convex clustering penalty significantly reduces the effective number of parameters to be estimated, thus making our clustered GGM an attractive choice for high-dimensional settings. The amount of fusion, and hence the number of clusters, is determined by the penalty parameter . The optimal can be chosen via data-driven model selection techniques such as consensus clustering [18].
2.4 The Clustered GGM Algorithm
We adopt the generalized ADMM framework described in [19, 20] as well as an approach introduced in [15] for convex clustering problems in order to develop a tractable 3-block ADMM algorithm to solve (2.3). ADMM is an appealing algorithm for this problem because it permits us to decouple the terms in (2.3) that are challenging to jointly optimize. Specifically, we reformulate (2.3) by introducing a set of auxiliary variables and rewrite the penalty term in terms of these auxiliary variables.
[TABLE]
Following from [19, 15, 20], we give Algorithm 1 to solve the clustered GGM problem:
denotes the gradient of the corresponding augmented Lagrangian in scaled form evaluated at and is the stepsize of gradient descent, which can be selected via the Goldstein-Armijo line search procedure. Here, is the iteration counter for the outer 3-block ADMM updates and is the iteration counter for the inner gradient descent updates for the subproblem. are canonical basis vectors and is the directed difference matrix such that . Convergence of the algorithm is measured by the norm of the primal and dual residuals and the parameters are fixed throughout the algorithm as recommended by [19].
Proposition 1
Algorithm 1 converges to a global solution to problem (2.3).
Proof sketch: We can first recast our 3-block ADMM problem (2.4) as the general 3-block ADMM formulation described in [21] by re-writing the set of equality constraints in (2.4) as a linear combination of the three optimization variables:
,
where , ,
\mathbf{A}_{1}=\left[\begin{array}[]{c}\mathbf{B}_{2g\times p^{2}}\\ \mathbf{0}_{g\times p^{2}}\end{array}\right], \mathbf{A}_{2}=\left[\begin{array}[]{c}-\mathbf{I}_{2g}\\ \mathbf{H}_{g\times 2g}\end{array}\right], \mathbf{A}_{3}=\left[\begin{array}[]{c}\mathbf{0}_{2g\times g}\\ -\mathbf{I}_{g}\end{array}\right]
with rows of containing appropriate canonical basis vectors and containing directed difference matrices on its diagonal. For notational simplicity, we use . With , we can show that (2.4) satisfies the sufficient conditions (Theorem 2.4 in [21]) for the convergence of such 3-block ADMM algorithm.
3 Experiments
3.1 Synthetic Data
In this subsection, we evaluate the comparative performance of our clustered GGM method on simulated data sets.
3.1.1 Data Generation
Suppose we have neurons which form functional clusters, we simulate a standardized precision matrix with the desired checkerboard patterns reflecting groundtruth functional clusters as follows: first we define the groundtruth cluster membership for the neurons by creating which has exactly one in each row and at least one in each column. We then generate symmetric matrix where denotes the partial correlation between two neurons if both neurons are in the th cluster and denotes the partial correlation between a neuron from the th cluster and a neuron from the th cluster. Specifically, and . Next, we generate the groundtruth precision matrix and set the diagonal entries of to ’s to ensure positive-definiteness. Finally, we generate the data matrix according to . We consider two simulation scenarios: Scenario I with , and neurons are randomly divided into clusters with size , , and , respectively; Scenario II with , and neurons are randomly divided into clusters with size , , and , respectively.
3.1.2 Results
We compare our clustered GGM to other popular clustering approaches: 1) k-means; 2) Hierarchical Clustering (HC) with various linkage functions and dissimilarity metrics (Euclidean distance and empirical correlation); 3) Spectral Clustering (SC) with various similarity metrics (empirical correlation and various kernel functions), implemented using R packages anocva and kernlab; 4) GLasso followed by commonly used community detection algorithms such as the Louvain method [22], implemented using R packages huge and igraph. The best penalty parameter for the GLasso is selected by the ebic criterion embedded in huge. Moreover, the oracle number of functional clusters is supplied to all aforementioned clustering techniques. Such practice is reasonable in the neuroscientific context because the number of functional clusters is typically known a priori from domain knowledge.
In Table 1, results on functional cluster recovery are presented. In particular, the performance in terms of functional cluster recovery is quantified using Rand Index which measures the agreement between the unsupervised clustering solutions and the true cluster membership. Rand Index takes values between [math] and with 1 indicating perfect cluster recovery. We only include the best performing approaches from each category 2), 3), and 4) in Table 1. Results in Table 1 reveal that our clustered GGM outperforms all competing approaches in terms of functional cluster recovery.
3.2 Case Study: Calcium Imaging Data
We test our method on a publicly available calcium imaging data set from [23, 24]. Neural activity of a subset of excitatory neurons in mouse visual cortex was recorded using multi-plane acquisition and to planes of different depth were recorded at the same time at a sampling rate of about Hz (see [23] for detailed data acquisition and processing procedures). During the course of experiments, natural images were shown to an awake mouse sequentially and averaged responses of each recorded neuron to the visual stimuli were determined after adjusting for trial-to-trial variability via model-based approaches. Each neuron is said to be tuned to the natural image to which it had the largest averaged responses and was subsequently assigned a neuron tuning label. Such neuron tuning labels are often used as estimates of functional clusters. However, such empirically inferred neuron tuning labels are likely to be noisy and there could be considerable amount of uncertainty associated with functional groups determined solely by such tuning labels. In this case study, we seek to evaluate how well the noisy neuron tuning labels serve as proxies for identifying functional clusters of neurons in mouse visual cortex.
We select excitatory neurons residing in the most superficial imaging plane that were empirically determined to be tuned to three most dissimilar natural images. The calcium imaging data come in the form of deconvolved calcium traces, whose distributions are highly skewed. To accommodate our model assumptions of independence and Gaussianity, we prewhiten individual calcium traces with an autoregressive model of order to remove temporal dependence and subsequently perform the semiparametric copula transformation [25] to make the data approximately follow a multivariate Gaussian distribution. Afterwards, we apply our clustered GGM to the processed traces of these neurons across 855 time points at stimulus onset. Specifically, we fit the clustered GGM to the data on a fine grid of penalty parameter values such that all neurons are clustered into one group for . The best penalty parameter value selected is and the corresponding estimated functional clusters are displayed in the right panel of Fig. 1.
In Fig. 1, each node denotes a neuron and edges represent the functional connectivity graph. Nodes on the inner circle are colored according to the noisy neuron tuning labels whereas nodes on the outer circle are colored based upon estimated functional cluster labels by our clustered GGM. The Rand Index between neuron tuning labels and functional cluster labels estimated by our clustered GGM is . Such results show that the functional clusters estimated by our clustered GGM largely agree with the empirically determined neuron tuning labels except for a handful of singletons, suggesting that neuron tuning labels serve as good proxies for identifying functional clusters of neurons in mouse visual cortex.
4 CONCLUSIONS
In this paper, we have introduced the clustered GGM via symmetric convex clustering in an unified convex optimization framework, which can be used to infer functional clusters among neurons from neural activity recordings. Key contributions include developing a novel symmetric convex clustering penalty to explicitly group neurons with similar functional connectivity patterns as well as providing a tractable algorithm to solve the clustered GGM problem with notable convergence guarantees. Experimental results on both synthetic data and real-world neuroscientific data demonstrate the effectiveness of our proposed method.
Even though the focus of this paper has been on the clustered GGM problem, our novel symmetric convex clustering penalty can be applied to many other convex loss functions that take symmetric matrices as inputs. Such flexibility of our novel penalty function suggests that there is potential for broad application of our approach to data in areas such as genomics and proteomics.
5 Detailed Derivations
In this section, we provide derivations of the ADMM algorithm in Algorithm 1 for the clustered GGM. Our notation here is the same as that used in the main body of the paper unless otherwise stated.
After introducing the set of auxiliary variables and rewriting the clustered GGM problem as (2.4), the augmented Lagrangian in scaled form is given by:
[TABLE]
Following from [19], the scaled form of the ADMM updates are given by:
[TABLE]
where and are the corresponding dual variables. First, we consider the -update:
[TABLE]
This is smooth and so we compute the gradient with respect to :
[TABLE]
using the identity 111See Equation (119) in the Matrix Cookbook: https://www.math.uwaterloo.ca/ hwolkowi/matrixcookbook.pdf
[TABLE]
Then the solution to the first subproblem can be obtained by applying gradient descent to convergence.
Now we consider the -update:
[TABLE]
Since this is fully smooth, we take the gradient with respect to to obtain the stationarity conditions:
[TABLE]
Re-arranging the terms, we obtain
[TABLE]
Though an analytical solution can be obtained by:
[TABLE]
This update can quickly become computationally expensive as the dimension grows due to matrix inversion. To avoid such explicit computation of matrix inverse, we exploit the special structure in and take an approach that parallel those of the ADMM for the completely connected convex clustering problem [15]. By definition, is the directed difference matrix and can be simplified as follows:
[TABLE]
using the identity and . Expanding , we obtain:
[TABLE]
using the identity . Now the LHS of (3) becomes:
[TABLE]
using the identity . Similarly, the RHS of (3) can be re-written as follows:
[TABLE]
Hence, equation (3) can be re-written as
[TABLE]
Un-vectorizing both sides of (4), we obtain:
[TABLE]
Applying the Sherman-Morrison formula, we can write the inverse of as
[TABLE]
Therefore, solving (5) for , we obtain
[TABLE]
To solve the third subproblem, we note that it can be written as a proximal operator:
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Philipp Berens, Jeremy Freeman, Thomas Deneux, Nikolay Chenkov, Thomas Mc Colgan, Artur Speiser, Jakob H. Macke, Srinivas C. Turaga, Patrick Mineault, Peter Rupprecht, Stephan Gerhard, Rainer W. Friedrich, Johannes Friedrich, Liam Paninski, Marius Pachitariu, Kenneth D. Harris, Ben Bolte, Timothy A. Machado, Dario Ringach, Jasmine Stone, Luke E. Rogerson, Nicolas J. Sofroniew, Jacob Reimer, Emmanouil Froudarakis, Thomas Euler, Miroslav Román Rosón, Lucas Theis, Andreas S. Tolias, and Matth
- 2[2] R. James Cotton, Emmanouil Froudarakis, Patrick Storer, Peter Saggau, and Andreas Tolias, “Three-dimensional mapping of microcircuit correlation structure,” Frontiers in Neural Circuits , vol. 7, pp. 151, 2013, DOI: 10.3389/fncir.2013.00151 .
- 3[3] Ildefons Magrans de Abril, Junichiro Yoshimoto, and Kenji Doya, “Connectivity inference from neural recording data: Challenges, mathematical bases and research directions,” Neural Networks , vol. 102, pp. 120 – 137, 2018, DOI: 10.1016/j.neunet.2018.02.016 .
- 4[4] Antonio Sutera, Arnaud Joly, Vincent François-Lavet, Zixiao Aaron Qiu, Gilles Louppe, Damien Ernst, and Pierre Geurts, “Simple connectome inference from partial correlation statistics in calcium imaging,” in Proceedings of the 2014 th International Conference on Neural Connectomics - Volume 46 . 2014, pp. 23–35, JMLR.org, URL: http://proceedings.mlr.press/v 46/sutera 15.pdf .
- 5[5] Steffen L Lauritzen, Graphical models , vol. 17, Clarendon Press, 1996.
- 6[6] Xiaolong Jiang, Shan Shen, Cathryn R. Cadwell, Philipp Berens, Fabian Sinz, Alexander S. Ecker, Saumil Patel, and Andreas S. Tolias, “Principles of connectivity among morphologically defined cell types in adult neocortex,” Science , vol. 350, no. 6264, 2015, DOI: 10.1126/science.aac 9462 .
- 7[7] Ed Bullmore and Olaf Sporns, “Complex brain networks: graph theoretical analysis of structural and functional systems,” Nature Reviews Neuroscience , vol. 10, pp. 186–198, 02 2009, DOI: 10.1038/nrn 2575 .
- 8[8] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” ACM Comput. Surv. , vol. 31, no. 3, pp. 264–323, Sept. 1999, DOI: 10.1145/331499.331504 .
