Clustered Gaussian Graphical Model via Symmetric Convex Clustering

Tianyi Yao; Genevera I. Allen

arXiv:1905.13251·stat.ML·June 3, 2019

Clustered Gaussian Graphical Model via Symmetric Convex Clustering

Tianyi Yao, Genevera I. Allen

PDF

Open Access

TL;DR

This paper introduces a new convex clustering method for Gaussian graphical models to identify functional neuron groups from activity data, with proven convergence and demonstrated effectiveness.

Contribution

It proposes a novel symmetric convex clustering penalty within a convex optimization framework for neural data analysis.

Findings

01

Effective clustering on synthetic data

02

Successful application to real neuroscientific data

03

Convergence guarantees for the ADMM algorithm

Abstract

Knowledge of functional groupings of neurons can shed light on structures of neural circuits and is valuable in many types of neuroimaging studies. However, accurately determining which neurons carry out similar neurological tasks via controlled experiments is both labor-intensive and prohibitively expensive on a large scale. Thus, it is of great interest to cluster neurons that have similar connectivity profiles into functionally coherent groups in a data-driven manner. In this work, we propose the clustered Gaussian graphical model (GGM) and a novel symmetric convex clustering penalty in an unified convex optimization framework for inferring functional clusters among neurons from neural activity data. A parallelizable multi-block Alternating Direction Method of Multipliers (ADMM) algorithm is used to solve the corresponding convex optimization problem. In addition, we establish…

Figures1

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1 : Simulation results averaged over 10 10 10 replicates in terms of Rand Index. Best performing methods are boldfaced.

Dataset	Method	Rand Index
Scenario I	Clustered GGM	0.964 (0.068)
	k-means	0.505 (0.003)
	HC Euclidean Ward	0.498 (0.007)
	SC empirical corr	0.511 (0.006)
	HC empirical corr Ward	0.521 (0.027)
	GLasso $+$ Louvain	0.556 (0.022)
Scenario II	Clustered GGM	0.999 (0.003)
	k-means	0.515 (0.01)
	HC Euclidean Ward	0.791 (0.003)
	SC empirical corr	0.526 (0.002)
	HC empirical corr Ward	0.513 (0.023)
	GLasso $+$ Louvain	0.566 (0.003)

Equations61

P (Θ) =

P (Θ) =

subject to

Θ \in S_{++}^{p}, {Ψ_{l}} minimize

Θ \in S_{++}^{p}, {Ψ_{l}} minimize

+ λ l \in M \sum w_{l} ∣∣ Ψ_{l, 1} - Ψ_{l, 2} ∣ ∣_{2}

subject to

Θ \in S_{++}^{p}, {Ψ_{l}}, {δ_{l}} minimize

Θ \in S_{++}^{p}, {Ψ_{l}}, {δ_{l}} minimize

+ λ l \in M \sum w_{l} ∣∣ δ_{l} ∣ ∣_{2}

subject to

Ψ_{l, 1} - Ψ_{l, 2} - δ_{l} = 0, \forall l \in M

L_{ρ_{1}, ρ_{2}} (Θ, {Ψ_{l}}, {δ_{l}}, {U_{l}}, {z_{l}})

L_{ρ_{1}, ρ_{2}} (Θ, {Ψ_{l}}, {δ_{l}}, {U_{l}}, {z_{l}})

\displaystyle+\frac{\rho_{1}}{2}\sum_{l\in\mathcal{M}}\Big{(}||\mathbf{Q}_{l}\bm{\mathbf{\Theta}}\mathbf{R}_{l}-\bm{\mathbf{\Psi}}_{l}+\mathbf{U}_{l}||_{F}^{2}-||\mathbf{U}_{l}||_{F}^{2}\Big{)}

\displaystyle+\frac{\rho_{2}}{2}\sum_{l\in\mathcal{M}}\Big{(}||\mathbf{D}\text{vec}(\bm{\mathbf{\Psi}}_{l})-\bm{\mathbf{\delta}}_{l}+\mathbf{z}_{l}||_{2}^{2}-||\mathbf{z}_{l}||_{2}^{2}\Big{)}

Θ^{(k)}

Θ^{(k)}

Ψ_{l}^{(k)}

δ_{l}^{(k)}

U_{l}^{(k)}

z_{l}^{(k)}

Θ^{(k)} = Θ \in S_{++}^{p} arg min - lo g det Θ + trace (\hat{Σ} Θ) + \frac{ρ _{1}}{2} l \in M \sum ∣∣ Q_{l} Θ R_{l} - Ψ_{l}^{(k - 1)} + U_{l}^{(k - 1)} ∣ ∣_{F}^{2}

Θ^{(k)} = Θ \in S_{++}^{p} arg min - lo g det Θ + trace (\hat{Σ} Θ) + \frac{ρ _{1}}{2} l \in M \sum ∣∣ Q_{l} Θ R_{l} - Ψ_{l}^{(k - 1)} + U_{l}^{(k - 1)} ∣ ∣_{F}^{2}

\nabla_{Θ} L = - Θ^{- 1} + \hat{Σ} + \frac{ρ _{1}}{2} l \in M \sum (2 Q_{l}^{T} Q_{l} Θ R_{l} R_{l}^{T} - 2 Q_{l}^{T} Ψ_{l} R_{l}^{T} + 2 Q_{l}^{T} U_{l} R_{l}^{T})

\nabla_{Θ} L = - Θ^{- 1} + \hat{Σ} + \frac{ρ _{1}}{2} l \in M \sum (2 Q_{l}^{T} Q_{l} Θ R_{l} R_{l}^{T} - 2 Q_{l}^{T} Ψ_{l} R_{l}^{T} + 2 Q_{l}^{T} U_{l} R_{l}^{T})

\frac{\partial}{\partial Θ} ∣∣ A Θ B + C ∣ ∣_{F}^{2} = 2 A^{T} (A Θ B + C) B^{T} .

\frac{\partial}{\partial Θ} ∣∣ A Θ B + C ∣ ∣_{F}^{2} = 2 A^{T} (A Θ B + C) B^{T} .

Ψ_{l}^{(k)}

Ψ_{l}^{(k)}

= vec (Ψ_{l}) \in R^{2 (p - 2)} arg min \frac{ρ _{1}}{2} ∣∣ vec (Q_{l} Θ^{(k)} R_{l}) - vec (Ψ_{l}) + vec (U_{l}^{(k - 1)}) ∣ ∣_{2}^{2} + \frac{ρ _{2}}{2} ∣∣ D vec (Ψ_{l}) - δ_{l}^{(k - 1)} + z_{l}^{(k - 1)} ∣ ∣_{2}^{2}

\displaystyle-\rho_{1}\Big{(}\text{vec}(\mathbf{Q}_{l}\bm{\mathbf{\Theta}}^{(k)}\mathbf{R}_{l})-\text{vec}(\bm{\mathbf{\Psi}}_{l})+\text{vec}(\mathbf{U}_{l}^{(k-1)})\Big{)}+\rho_{2}\mathbf{D}^{T}\Big{(}\mathbf{D}\text{vec}(\bm{\mathbf{\Psi}}_{l})-\bm{\mathbf{\delta}}_{l}^{(k-1)}+\mathbf{z}_{l}^{(k-1)}\Big{)}=\mathbf{0}.

\displaystyle-\rho_{1}\Big{(}\text{vec}(\mathbf{Q}_{l}\bm{\mathbf{\Theta}}^{(k)}\mathbf{R}_{l})-\text{vec}(\bm{\mathbf{\Psi}}_{l})+\text{vec}(\mathbf{U}_{l}^{(k-1)})\Big{)}+\rho_{2}\mathbf{D}^{T}\Big{(}\mathbf{D}\text{vec}(\bm{\mathbf{\Psi}}_{l})-\bm{\mathbf{\delta}}_{l}^{(k-1)}+\mathbf{z}_{l}^{(k-1)}\Big{)}=\mathbf{0}.

(ρ_{1} I_{2 (p - 2)} + ρ_{2} D^{T} D) vec (Ψ_{l}) = ρ_{1} (vec (Q_{l} Θ^{(k)} R_{l}) + vec (U_{l}^{(k - 1)})) + ρ_{2} D^{T} (δ_{l}^{(k - 1)} - z_{l}^{(k - 1)})

(ρ_{1} I_{2 (p - 2)} + ρ_{2} D^{T} D) vec (Ψ_{l}) = ρ_{1} (vec (Q_{l} Θ^{(k)} R_{l}) + vec (U_{l}^{(k - 1)})) + ρ_{2} D^{T} (δ_{l}^{(k - 1)} - z_{l}^{(k - 1)})

vec (Ψ_{l}) = (ρ_{1} I_{2 (p - 2)} + ρ_{2} D^{T} D)^{- 1} [ρ_{1} (vec (Q_{l} Θ^{(k)} R_{l}) + vec (U_{l}^{(k - 1)})) + ρ_{2} D^{T} (δ_{l}^{(k - 1)} - z_{l}^{(k - 1)})]

vec (Ψ_{l}) = (ρ_{1} I_{2 (p - 2)} + ρ_{2} D^{T} D)^{- 1} [ρ_{1} (vec (Q_{l} Θ^{(k)} R_{l}) + vec (U_{l}^{(k - 1)})) + ρ_{2} D^{T} (δ_{l}^{(k - 1)} - z_{l}^{(k - 1)})]

D^{T} D

D^{T} D

= [(e_{1} - e_{2}) (e_{1} - e_{2})^{T}] \otimes I_{(p - 2)}

= (2 I_{2} - 1 1^{T}) \otimes I_{(p - 2)}

ρ_{1} I_{2 (p - 2)} + ρ_{2} D^{T} D

ρ_{1} I_{2 (p - 2)} + ρ_{2} D^{T} D

= [ρ_{1} I_{2} + ρ_{2} (2 I_{2} - 1 1^{T})] \otimes I_{(p - 1)}

(ρ_{1} I_{2 (p - 2)} + ρ_{2} D^{T} D) vec (Ψ_{l})

(ρ_{1} I_{2 (p - 2)} + ρ_{2} D^{T} D) vec (Ψ_{l})

\displaystyle=\text{vec}\Big{(}\bm{\mathbf{\Psi}}_{l}[\rho_{1}\mathbf{I}_{2}+\rho_{2}(2\mathbf{I}_{2}-\mathbf{1}\mathbf{1}^{T})]\Big{)}

\displaystyle=\rho_{1}\text{vec}(\mathbf{Q}_{l}\bm{\mathbf{\Theta}}^{(k)}\mathbf{R}_{l}+\mathbf{U}_{l}^{(k-1)})+\rho_{2}\text{vec}\Big{(}(\bm{\mathbf{\delta}}_{l}^{(k-1)}-\mathbf{z}_{l}^{(k-1)})(\mathbf{e}_{1}-\mathbf{e}_{2})^{T}\Big{)}

\displaystyle=\rho_{1}\text{vec}(\mathbf{Q}_{l}\bm{\mathbf{\Theta}}^{(k)}\mathbf{R}_{l}+\mathbf{U}_{l}^{(k-1)})+\rho_{2}\text{vec}\Big{(}(\bm{\mathbf{\delta}}_{l}^{(k-1)}-\mathbf{z}_{l}^{(k-1)})(\mathbf{e}_{1}-\mathbf{e}_{2})^{T}\Big{)}

\displaystyle\text{vec}\Big{(}\bm{\mathbf{\Psi}}_{l}[\rho_{1}\mathbf{I}_{2}+\rho_{2}(2\mathbf{I}_{2}-\mathbf{1}\mathbf{1}^{T})]\Big{)}=\rho_{1}\text{vec}(\mathbf{Q}_{l}\bm{\mathbf{\Theta}}^{(k)}\mathbf{R}_{l}+\mathbf{U}_{l}^{(k-1)})+\rho_{2}\text{vec}\Big{(}(\bm{\mathbf{\delta}}_{l}^{(k-1)}-\mathbf{z}_{l}^{(k-1)})(\mathbf{e}_{1}-\mathbf{e}_{2})^{T}\Big{)}

\displaystyle\text{vec}\Big{(}\bm{\mathbf{\Psi}}_{l}[\rho_{1}\mathbf{I}_{2}+\rho_{2}(2\mathbf{I}_{2}-\mathbf{1}\mathbf{1}^{T})]\Big{)}=\rho_{1}\text{vec}(\mathbf{Q}_{l}\bm{\mathbf{\Theta}}^{(k)}\mathbf{R}_{l}+\mathbf{U}_{l}^{(k-1)})+\rho_{2}\text{vec}\Big{(}(\bm{\mathbf{\delta}}_{l}^{(k-1)}-\mathbf{z}_{l}^{(k-1)})(\mathbf{e}_{1}-\mathbf{e}_{2})^{T}\Big{)}

Ψ_{l} [(1 + 2 \frac{ρ _{2}}{ρ _{1}}) I_{2} - \frac{ρ _{2}}{ρ _{1}} 1 1^{T}] = Q_{l} Θ^{(k)} R_{l} + U_{l}^{(k - 1)} + \frac{ρ _{2}}{ρ _{1}} (δ_{l}^{(k - 1)} - z_{l}^{(k - 1)}) (e_{1} - e_{2})^{T}

Ψ_{l} [(1 + 2 \frac{ρ _{2}}{ρ _{1}}) I_{2} - \frac{ρ _{2}}{ρ _{1}} 1 1^{T}] = Q_{l} Θ^{(k)} R_{l} + U_{l}^{(k - 1)} + \frac{ρ _{2}}{ρ _{1}} (δ_{l}^{(k - 1)} - z_{l}^{(k - 1)}) (e_{1} - e_{2})^{T}

[(1 + 2 \frac{ρ _{2}}{ρ _{1}}) I_{2} - \frac{ρ _{2}}{ρ _{1}} 1 1^{T}]^{- 1} = \frac{1}{1 + 2 \frac{ρ _{2}}{ρ _{1}}} (I_{2} + \frac{ρ _{2}}{ρ _{1}} 1 1^{T}) .

[(1 + 2 \frac{ρ _{2}}{ρ _{1}}) I_{2} - \frac{ρ _{2}}{ρ _{1}} 1 1^{T}]^{- 1} = \frac{1}{1 + 2 \frac{ρ _{2}}{ρ _{1}}} (I_{2} + \frac{ρ _{2}}{ρ _{1}} 1 1^{T}) .

Ψ_{l}^{(k)} = \frac{1}{1 + 2 \frac{ρ _{2}}{ρ _{1}}} [Q_{l} Θ^{(k)} R_{l} + U_{l}^{(k - 1)} + \frac{ρ _{2}}{ρ _{1}} (δ_{l}^{(k - 1)} - z_{l}^{(k - 1)}) (e_{1} - e_{2})^{T}] (I_{2} + \frac{ρ _{2}}{ρ _{1}} 1 1^{T}) .

Ψ_{l}^{(k)} = \frac{1}{1 + 2 \frac{ρ _{2}}{ρ _{1}}} [Q_{l} Θ^{(k)} R_{l} + U_{l}^{(k - 1)} + \frac{ρ _{2}}{ρ _{1}} (δ_{l}^{(k - 1)} - z_{l}^{(k - 1)}) (e_{1} - e_{2})^{T}] (I_{2} + \frac{ρ _{2}}{ρ _{1}} 1 1^{T}) .

δ_{l}^{(k)}

δ_{l}^{(k)}

= δ_{l} \in R^{p - 2} arg min \frac{λ w _{l}}{ρ _{2}} ∣∣ δ_{l} ∣ ∣_{2} + \frac{1}{2} ∣∣ δ_{l} - (D vec (Ψ_{l}^{(k)}) + z_{l}^{(k - 1)}) ∣ ∣_{2}^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Remote-Sensing Image Classification · Gene expression and cancer classification

MethodsAlternating Direction Method of Multipliers

Full text

Clustered Gaussian Graphical Model via Symmetric Convex Clustering

Abstract

Knowledge of functional groupings of neurons can shed light on structures of neural circuits and is valuable in many types of neuroimaging studies. However, accurately determining which neurons carry out similar neurological tasks via controlled experiments is both labor-intensive and prohibitively expensive on a large scale. Thus, it is of great interest to cluster neurons that have similar connectivity profiles into functionally coherent groups in a data-driven manner. In this work, we propose the clustered Gaussian graphical model (GGM) and a novel symmetric convex clustering penalty in an unified convex optimization framework for inferring functional clusters among neurons from neural activity data. A parallelizable multi-block Alternating Direction Method of Multipliers (ADMM) algorithm is used to solve the corresponding convex optimization problem. In addition, we establish convergence guarantees for the proposed ADMM algorithm. Experimental results on both synthetic data and real-world neuroscientific data demonstrate the effectiveness of our approach.

**Index Terms— ** Gaussian graphical model, Convex clustering, ADMM, Computational neuroscience

1 INTRODUCTION

In neuroscience, an important goal is to identify which neurons are involved in similar computations and how they are organized into functionally coherent units to carry out specific computational tasks in the brain. Such knowledge of functional organizations of neurons could lead to a better understanding of structures of interconnected neural circuits and thus the operating mechanisms of the brain. Advancement of optical imaging technologies such as calcium imaging has enabled indirect recordings of spiking activity from thousands of neurons simultaneously [1, 2]. Learning the functional organizations of large neuronal populations from such high-dimensional neural activity recording data is a major challenge in computational neuroscience.

Functional connectivity, which is defined as statistical dependence among measurements of neuronal activity in [3], has been widely used to describe functional interactions among measured neuronal populations. Because functional connectivity is not directly observable, numerous techniques such as correlations and partial correlations have been proposed to estimate such functional connectivity from neural recording data (see [3] for a comprehensive review). In this work, we define functional connectivity between each pair of recorded neurons to be their pairwise partial correlation or edges in an undirected GGM in high dimensions. Because the pairwise partial correlation between two neurons takes activities of all the other recorded neurons into account, it captures only direct associations between neurons and discard all indirect associations [3, 4], which makes pairwise partial correlation coefficient a better indicator of functional connectivity than Pearson correlation. Furthermore, because pairwise partial correlation is the same as the corresponding off-diagonal entries of the standardized precision matrix, the functional connectivity graph of all recorded neurons can be represented by the standardized precision matrix or the corresponding undirected GGM [5].

While there is no standardized definition for functional cluster, many neuroscientific studies have found that each neuronal type has its own distinct input-output connectivity patterns [6] and neurons with similar connectivity patterns typically have similar neurological roles and functions [7]. Therefore, in this work, we seek to define functional clusters to be groups of neurons that share functional connectivity patterns. Hence, inferring functionally coherent groups of neurons is equivalent to clustering neurons with similar functional connectivity patterns.

While many techniques have been proposed for uncovering clusters from multivariate data (see [8] for a comprehensive review) as well as for finding community structures in network data (see [9] for a comprehensive review), they are somewhat limited in this application for various reasons. First of all, distance-based clustering techniques such as k-means and hierarchical clustering on pairwise Euclidean distances cluster variables based on the first-moment of the distribution, whereas functional clusters are defined by functional connectivity patterns, which are characterized by the second-moment of the distribution. Some studies in fMRI have applied hierarchical clustering on empirical partial correlation based dissimilarity matrix to cluster brain regions [10]. However, such approaches are not applicable to high-dimensional neural activity data because the MLE of partial correlation matrix does not exist due to singularity of the empirical covariance matrix. Others have taken a two-step approach where a functional connectivity graph is first estimated and then community detection algorithms are used subsequently to infer clusters [11, 12]. Yet such two-step approaches are highly sensitive to noise as a single erroneously estimated functional connection in the first step could adversely impact the clustering results of the community detection algorithms. Last but not least, some studies have proposed nonparametric Bayesian approaches for estimating the block structures of GGM and clustering variables using a MCMC sampling method [13]. However, such MCMC-based approaches can easily become computationally infeasible on moderate-sized graphs.

In this paper, we make several methodological contributions: (1) We propose the clustered GGM that involves a novel symmetric convex clustering penalty, which allows us to exploit the symmetric structures of a functional connectivity graph for better estimation of functional clusters. (2) We provide a tractable ADMM algorithm with convergence guarantees to fit our clustered GGM method in big-data settings. Because of these contributions, our clustered GGM method enjoys many advantages over existing approaches to inferring functional clusters from neural activity data: (i) With our novel symmetric convex clustering penalty, our method explicitly leverages functional connectivity patterns to cluster neurons into functionally coherent groups. (ii) Because the clustered GGM is formulated in an unified convex optimization framework, our single-step method is more stable and conducive to data-driven model selection.

2 The Clustered GGM

2.1 Model Setup and Background

Suppose the neural activity recordings of $p$ neurons over $n$ time points are arranged into the data matrix $\mathbf{X}\in\mathbb{R}^{n\times p}$ and the recording of all $p$ neurons at the $i$ th time point, $\mathbf{X}_{i}=\{X_{i1},\ldots,X_{ip}\}$ , is a random $p$ -vector independently drawn from the same time-invariant $p$ -variate Gaussian distribution $\mathcal{N}(\mathbf{0}_{p},\bm{\mathbf{\Sigma}}_{p\times p})$ [4]. We can approximately achieve the assumption of independence by prewhitening the raw time series using appropriate time series models. As noted before, the functional connectivity graph can be represented by the standardized precision matrix $\bm{\mathbf{\Theta}}\succ 0$ , where $\bm{\mathbf{\Theta}}_{ij}=-\bm{\mathbf{\Sigma}}^{-1}_{ij}/\sqrt{\bm{\mathbf{\Sigma}}^{-1}_{ii}\bm{\mathbf{\Sigma}}^{-1}_{jj}}$ . Hence, estimating functional clusters based on functional connectivity patterns is equivalent to recovering the group structures that form checkerboard patterns in $\bm{\mathbf{\Theta}}$ .

2.2 The Symmetric Convex Clustering Penalty

At first glance, designing a penalty function to encourage checkerboard patterns in the estimate of $\bm{\mathbf{\Theta}}$ seems straightforward as one might ask whether we can simply apply the convex biclustering fusion penalty [14] to simultaneously force rows and columns of $\bm{\mathbf{\Theta}}$ to coalesce to form block structures. However, simply applying such biclustering penalty does not guarantee the same amount of fusion along the rows and columns of $\bm{\mathbf{\Theta}}$ and it can easily result in different estimated functional clusters between rows and columns. In fact, any fusion penalty that directly regularizes elements of $\bm{\mathbf{\Theta}}$ would lead to asymmetric estimates, thus leading to difficult interpretations. Also recognized by [13], designing a penalty function to force such checkerboard patterns in a GGM in a computationally feasible way is indeed a challenging task.

Our objective is to develop a convex penalty function that allows us to explicitly model functional clusters among neurons based on mutual pairwise functional connectivity patterns and preserve the symmetry of estimated functional connectivity graph as well as neuron cluster assignments. To this end, we build upon the convex fusion penalty [15, 16] and introduce a novel symmetric convex clustering penalty that encourages symmetric checkerboard patterns in the estimated precision matrix.

Consider a $p\times p$ symmetric matrix $\bm{\mathbf{\Theta}}$ , the symmetric convex clustering penalty function takes the form

[TABLE]

Here, we index a neuron pair by $l=(i,j)$ with $1\leq i<j\leq p$ and define the fusion set over the non-zero fusion weights $\mathcal{M}=\{l=(i,j)\>:\>w_{l}>0\}$ . The set of all nonnegative, pairwise fusion weights $\{w_{l}\}_{1\leq i<j\leq p}$ can be specified beforehand to incorporate domain knowledge and take auxiliary information (e.g. interneuron distances) into account. Additionally, $\bm{\mathbf{\Psi}}_{l,1}$ (and $\bm{\mathbf{\Psi}}_{l,2}$ ) denotes the $1$ st (and $2$ nd) column of $\bm{\mathbf{\Psi}}_{l}\in\mathbb{R}^{(p-2)\times 2}$ , which can be interpreted as cluster centroid matrix corresponding to the $l=(i,j)$ th neuron pair. For $l=(i,j)$ , the rows of $\mathbf{Q}_{l}\in\mathbb{R}^{(p-2)\times p}$ consist of canonical basis vectors $\mathbf{e}_{q}$ for $q\in\{1,2,\ldots,p\}\setminus\{i,j\}$ and the columns of $\mathbf{R}_{l}\in\mathbb{R}^{p\times 2}$ consist of canonical basis vectors $\mathbf{e}_{r}$ for $r\in\{i,j\}$ .

We now discuss the intuition behind the symmetric convex clustering penalty $P(\bm{\mathbf{\Theta}})$ . For any neuron pair $l=(i,j)$ in the fusion set $\mathcal{M}$ , the canonical basis matrices $\mathbf{Q}_{l}$ and $\mathbf{R}_{l}$ extracts a portion of $\bm{\mathbf{\Theta}}$ such that the $1$ st (and $2$ nd) column of $\mathbf{Q}_{l}\bm{\mathbf{\Theta}}\mathbf{R}_{l}\in\mathbb{R}^{(p-2)\times 2}$ represents the functional connectivity patterns of neuron $i$ (and neuron $j$ ) with all the other recorded neurons. $\bm{\mathbf{\Psi}}_{l}$ is taken to be a copy of $\mathbf{Q}_{l}\bm{\mathbf{\Theta}}\mathbf{R}_{l}$ and the fusion penalty term $||\>\bm{\mathbf{\Psi}}_{l,1}-\bm{\mathbf{\Psi}}_{l,2}\>||_{2}$ induces sparsity in the difference between neuron $i$ and $j$ ’s respective functional connectivity patterns with all the other recorded neurons, thus encouraging the estimates of $\bm{\mathbf{\Psi}}_{l,1}$ and $\bm{\mathbf{\Psi}}_{l,2}$ to fuse. Neuron $i$ and $j$ are assigned to the same functional cluster if $\hat{\bm{\mathbf{\Psi}}}_{l,1}=\hat{\bm{\mathbf{\Psi}}}_{l,2}$ , which means neuron $i$ and $j$ have the same conditional relationships with all the other recorded neurons. All such fusions can be done separately and in parallel for each neuron pair $l\in\mathcal{M}$ , and the set of equality constraints $\mathbf{Q}_{l}\bm{\mathbf{\Theta}}\mathbf{R}_{l}-\bm{\mathbf{\Psi}}_{l}=0,\forall l\in\mathcal{M}$ aggregates all fusion results back to $\bm{\mathbf{\Theta}}$ to form symmetric checkerboard patterns denoting functional clusters among neurons.

2.3 The Clustered GGM via Symmetric Convex Clustering

While one can apply the symmetric convex clustering penalty to any loss functions that take a symmetric matrix as input, we specifically apply $P(\bm{\mathbf{\Theta}})$ to the negative log-likelihood of the multivariate Gaussian distribution to yield the clustered GGM problem.

[TABLE]

where $\mathbb{S}_{++}^{p}$ denotes the set of positive definite matrices of size $p$ and $\hat{\bm{\mathbf{\Sigma}}}=\frac{1}{n}\mathbf{X}^{T}\mathbf{X}$ denotes the empirical covariance matrix (assuming the columns of $\mathbf{X}$ are properly centered). Unlike the GLasso problem [17], our clustered GGM does not aim to produce a sparse graph estimate. Instead, our clustered GGM leads to a graph estimate $\hat{\bm{\mathbf{\Theta}}}$ with block structures that indicate cluster assignments of nodes. In addition, fusing many elements of $\bm{\mathbf{\Theta}}$ to the same values, the symmetric convex clustering penalty significantly reduces the effective number of parameters to be estimated, thus making our clustered GGM an attractive choice for high-dimensional settings. The amount of fusion, and hence the number of clusters, is determined by the penalty parameter $\lambda$ . The optimal $\lambda$ can be chosen via data-driven model selection techniques such as consensus clustering [18].

2.4 The Clustered GGM Algorithm

We adopt the generalized ADMM framework described in [19, 20] as well as an approach introduced in [15] for convex clustering problems in order to develop a tractable 3-block ADMM algorithm to solve (2.3). ADMM is an appealing algorithm for this problem because it permits us to decouple the terms in (2.3) that are challenging to jointly optimize. Specifically, we reformulate (2.3) by introducing a set of auxiliary variables $\{\bm{\mathbf{\delta}}_{l}\}$ and rewrite the penalty term in terms of these auxiliary variables.

[TABLE]

Following from [19, 15, 20], we give Algorithm 1 to solve the clustered GGM problem:

$\nabla_{\bm{\mathbf{\Theta}}}\mathcal{L}(\bm{\mathbf{\Theta}}^{(k,j-1)})$ denotes the gradient of the corresponding augmented Lagrangian in scaled form evaluated at $\bm{\mathbf{\Theta}}^{(k,j-1)}$ and $s_{j}$ is the stepsize of gradient descent, which can be selected via the Goldstein-Armijo line search procedure. Here, $k$ is the iteration counter for the outer 3-block ADMM updates and $j$ is the iteration counter for the inner gradient descent updates for the $\bm{\mathbf{\Theta}}$ subproblem. $\mathbf{e}_{1},\mathbf{e}_{2}\in\mathbb{R}^{2}$ are canonical basis vectors and $\mathbf{D}\in\mathbb{R}^{(p-2)\times 2(p-2)}=(\mathbf{e}_{1}-\mathbf{e}_{2})^{T}\otimes\mathbf{I}_{(p-2)}$ is the directed difference matrix such that $\mathbf{D}\text{vec}(\bm{\mathbf{\Psi}}_{l})=\bm{\mathbf{\Psi}}_{l,1}-\bm{\mathbf{\Psi}}_{l,2}$ . Convergence of the algorithm is measured by the norm of the primal and dual residuals and the parameters $\rho_{1},\rho_{2}>0$ are fixed throughout the algorithm as recommended by [19].

Proposition 1

Algorithm 1 converges to a global solution to problem (2.3).

Proof sketch: We can first recast our 3-block ADMM problem (2.4) as the general 3-block ADMM formulation described in [21] by re-writing the set of equality constraints in (2.4) as a linear combination of the three optimization variables:

$\mathbf{A}_{1}\text{vec}(\bm{\mathbf{\Theta}})+\mathbf{A}_{2}\text{vec}(\bm{\mathbf{\Psi}})+\mathbf{A}_{3}\text{vec}(\bm{\mathbf{\Delta}})=\mathbf{0}$ ,

where $\bm{\mathbf{\Psi}}=[\bm{\mathbf{\Psi}}_{1},\ldots,\bm{\mathbf{\Psi}}_{|\mathcal{M}|}]$ , $\bm{\mathbf{\Delta}}=[\bm{\mathbf{\delta}}_{1},\ldots,\bm{\mathbf{\delta}}_{|\mathcal{M}|}]$ ,

$\mathbf{A}_{1}=\left[\begin{array}[]{c}\mathbf{B}_{2g\times p^{2}}\\ \mathbf{0}_{g\times p^{2}}\end{array}\right]$ , $\mathbf{A}_{2}=\left[\begin{array}[]{c}-\mathbf{I}_{2g}\\ \mathbf{H}_{g\times 2g}\end{array}\right]$ , $\mathbf{A}_{3}=\left[\begin{array}[]{c}\mathbf{0}_{2g\times g}\\ -\mathbf{I}_{g}\end{array}\right]$

with rows of $\mathbf{B}$ containing appropriate canonical basis vectors and $\mathbf{H}$ containing $|\mathcal{M}|$ directed difference matrices $\mathbf{D}$ on its diagonal. For notational simplicity, we use $g=(p-2)|\mathcal{M}|$ . With $\mathbf{A}_{1}^{T}\mathbf{A}_{3}=\mathbf{0}$ , we can show that (2.4) satisfies the sufficient conditions (Theorem 2.4 in [21]) for the convergence of such 3-block ADMM algorithm.

3 Experiments

3.1 Synthetic Data

In this subsection, we evaluate the comparative performance of our clustered GGM method on simulated data sets.

3.1.1 Data Generation

Suppose we have $p$ neurons which form $k$ functional clusters, we simulate a standardized precision matrix $\bm{\mathbf{\Theta}}$ with the desired checkerboard patterns reflecting groundtruth functional clusters as follows: first we define the groundtruth cluster membership for the $p$ neurons by creating $\mathbf{Z}_{p\times k}\in\{0,1\}$ which has exactly one $1$ in each row and at least one $1$ in each column. We then generate symmetric matrix $\mathbf{B}_{k\times k}\in[-1,1]$ where $B_{ii}$ denotes the partial correlation between two neurons if both neurons are in the $i$ th cluster and $B_{ij}$ denotes the partial correlation between a neuron from the $i$ th cluster and a neuron from the $j$ th cluster. Specifically, $B_{ii}\overset{i.i.d.}{\sim}\text{Unif}([0.6,0.95])$ and $B_{ij}\overset{i.i.d.}{\sim}\text{Unif}([0,0.55])$ . Next, we generate the groundtruth precision matrix $\bm{\mathbf{\Theta}}=\mathbf{Z}\mathbf{B}\mathbf{Z}^{T}$ and set the diagonal entries of $\bm{\mathbf{\Theta}}$ to $1$ ’s to ensure positive-definiteness. Finally, we generate the data matrix $\mathbf{X}\in\mathbb{R}^{n\times p}$ according to $\mathbf{x}_{1},\ldots,\mathbf{x}_{n}\overset{i.i.d.}{\sim}\mathcal{N}(\mathbf{0},\bm{\mathbf{\Theta}}^{-1})$ . We consider two simulation scenarios: Scenario I with $n=110$ , and $p=50$ neurons are randomly divided into $k=3$ clusters with size $5$ , $15$ , and $30$ , respectively; Scenario II with $n=200$ , and $p=200$ neurons are randomly divided into $k=3$ clusters with size $30$ , $60$ , and $110$ , respectively.

3.1.2 Results

We compare our clustered GGM to other popular clustering approaches: 1) k-means; 2) Hierarchical Clustering (HC) with various linkage functions and dissimilarity metrics (Euclidean distance and empirical correlation); 3) Spectral Clustering (SC) with various similarity metrics (empirical correlation and various kernel functions), implemented using R packages anocva and kernlab; 4) GLasso followed by commonly used community detection algorithms such as the Louvain method [22], implemented using R packages huge and igraph. The best penalty parameter for the GLasso is selected by the ebic criterion embedded in huge. Moreover, the oracle number of functional clusters $k=3$ is supplied to all aforementioned clustering techniques. Such practice is reasonable in the neuroscientific context because the number of functional clusters is typically known a priori from domain knowledge.

In Table 1, results on functional cluster recovery are presented. In particular, the performance in terms of functional cluster recovery is quantified using Rand Index which measures the agreement between the unsupervised clustering solutions and the true cluster membership. Rand Index takes values between [math] and $1$ with 1 indicating perfect cluster recovery. We only include the best performing approaches from each category 2), 3), and 4) in Table 1. Results in Table 1 reveal that our clustered GGM outperforms all competing approaches in terms of functional cluster recovery.

3.2 Case Study: Calcium Imaging Data

We test our method on a publicly available calcium imaging data set from [23, 24]. Neural activity of a subset of excitatory neurons in mouse visual cortex was recorded using multi-plane acquisition and $10$ to $12$ planes of different depth were recorded at the same time at a sampling rate of about $3$ Hz (see [23] for detailed data acquisition and processing procedures). During the course of experiments, $32$ natural images were shown to an awake mouse sequentially and averaged responses of each recorded neuron to the visual stimuli were determined after adjusting for trial-to-trial variability via model-based approaches. Each neuron is said to be tuned to the natural image to which it had the largest averaged responses and was subsequently assigned a neuron tuning label. Such neuron tuning labels are often used as estimates of functional clusters. However, such empirically inferred neuron tuning labels are likely to be noisy and there could be considerable amount of uncertainty associated with functional groups determined solely by such tuning labels. In this case study, we seek to evaluate how well the noisy neuron tuning labels serve as proxies for identifying functional clusters of neurons in mouse visual cortex.

We select $52$ excitatory neurons residing in the most superficial imaging plane that were empirically determined to be tuned to three most dissimilar natural images. The calcium imaging data come in the form of deconvolved calcium traces, whose distributions are highly skewed. To accommodate our model assumptions of independence and Gaussianity, we prewhiten individual calcium traces with an autoregressive model of order $1$ to remove temporal dependence and subsequently perform the semiparametric copula transformation [25] to make the data approximately follow a multivariate Gaussian distribution. Afterwards, we apply our clustered GGM to the processed traces of these $52$ neurons across 855 time points at stimulus onset. Specifically, we fit the clustered GGM to the data on a fine grid of penalty parameter values $\lambda\in[0,1.92]$ such that all neurons are clustered into one group for $\lambda\geq 1.92$ . The best penalty parameter value selected is $\lambda=1.06$ and the corresponding estimated functional clusters are displayed in the right panel of Fig. 1.

In Fig. 1, each node denotes a neuron and edges represent the functional connectivity graph. Nodes on the inner circle are colored according to the noisy neuron tuning labels whereas nodes on the outer circle are colored based upon estimated functional cluster labels by our clustered GGM. The Rand Index between neuron tuning labels and functional cluster labels estimated by our clustered GGM is $0.868$ . Such results show that the functional clusters estimated by our clustered GGM largely agree with the empirically determined neuron tuning labels except for a handful of singletons, suggesting that neuron tuning labels serve as good proxies for identifying functional clusters of neurons in mouse visual cortex.

4 CONCLUSIONS

In this paper, we have introduced the clustered GGM via symmetric convex clustering in an unified convex optimization framework, which can be used to infer functional clusters among neurons from neural activity recordings. Key contributions include developing a novel symmetric convex clustering penalty to explicitly group neurons with similar functional connectivity patterns as well as providing a tractable algorithm to solve the clustered GGM problem with notable convergence guarantees. Experimental results on both synthetic data and real-world neuroscientific data demonstrate the effectiveness of our proposed method.

Even though the focus of this paper has been on the clustered GGM problem, our novel symmetric convex clustering penalty can be applied to many other convex loss functions that take symmetric matrices as inputs. Such flexibility of our novel penalty function suggests that there is potential for broad application of our approach to data in areas such as genomics and proteomics.

5 Detailed Derivations

In this section, we provide derivations of the ADMM algorithm in Algorithm 1 for the clustered GGM. Our notation here is the same as that used in the main body of the paper unless otherwise stated.

After introducing the set of auxiliary variables $\{\bm{\mathbf{\delta}}_{l}\}$ and rewriting the clustered GGM problem as (2.4), the augmented Lagrangian in scaled form is given by:

[TABLE]

Following from [19], the scaled form of the ADMM updates are given by:

[TABLE]

where $\{\mathbf{U}_{l}\}_{l\in\mathcal{M}}$ and $\{\mathbf{z}_{l}\}_{l\in\mathcal{M}}$ are the corresponding dual variables. First, we consider the $\bm{\mathbf{\Theta}}$ -update:

[TABLE]

This is smooth and so we compute the gradient with respect to $\bm{\mathbf{\Theta}}$ :

[TABLE]

using the identity 111See Equation (119) in the Matrix Cookbook: https://www.math.uwaterloo.ca/ hwolkowi/matrixcookbook.pdf

[TABLE]

Then the solution $\bm{\mathbf{\Theta}}^{(k)}$ to the first subproblem can be obtained by applying gradient descent to convergence.

Now we consider the $\bm{\mathbf{\Psi}}_{l}$ -update:

[TABLE]

Since this is fully smooth, we take the gradient with respect to $\text{vec}(\bm{\mathbf{\Psi}}_{l})$ to obtain the stationarity conditions:

[TABLE]

Re-arranging the terms, we obtain

[TABLE]

Though an analytical solution can be obtained by:

[TABLE]

This update can quickly become computationally expensive as the dimension $p$ grows due to matrix inversion. To avoid such explicit computation of matrix inverse, we exploit the special structure in $(\rho_{1}\mathbf{I}_{2(p-2)}+\rho_{2}\mathbf{D}^{T}\mathbf{D})$ and take an approach that parallel those of the ADMM for the completely connected convex clustering problem [15]. By definition, $\mathbf{D}\in\mathbb{R}^{(p-2)\times 2(p-2)}=(\mathbf{e}_{1}-\mathbf{e}_{2})^{T}\otimes\mathbf{I}_{(p-2)}$ is the directed difference matrix and $\mathbf{D}^{T}\mathbf{D}$ can be simplified as follows:

[TABLE]

using the identity $(\mathbf{A}\otimes\mathbf{B})(\mathbf{C}\otimes\mathbf{D})=(\mathbf{AC}\otimes\mathbf{BD})$ and $(\mathbf{e}_{1}-\mathbf{e}_{2})(\mathbf{e}_{1}-\mathbf{e}_{2})^{T}=2\mathbf{I}_{2}-\mathbf{1}\mathbf{1}^{T}$ . Expanding $(\rho_{1}\mathbf{I}_{2(p-2)}+\rho_{2}\mathbf{D}^{T}\mathbf{D})$ , we obtain:

[TABLE]

using the identity $\mathbf{A}\otimes\mathbf{C}+\mathbf{B}\otimes\mathbf{C}=(\mathbf{A}+\mathbf{B})\otimes\mathbf{C}$ . Now the LHS of (3) becomes:

[TABLE]

using the identity $[\mathbf{A}^{T}\otimes\mathbf{I}]\text{vec}(\mathbf{B})=\text{vec}(\mathbf{B}\mathbf{A})$ . Similarly, the RHS of (3) can be re-written as follows:

[TABLE]

Hence, equation (3) can be re-written as

[TABLE]

Un-vectorizing both sides of (4), we obtain:

[TABLE]

Applying the Sherman-Morrison formula, we can write the inverse of $[(1+2\frac{\rho_{2}}{\rho_{1}})\mathbf{I}_{2}-\frac{\rho_{2}}{\rho_{1}}\mathbf{1}\mathbf{1}^{T}]$ as

[TABLE]

Therefore, solving (5) for $\bm{\mathbf{\Psi}}_{l}$ , we obtain

[TABLE]

To solve the third subproblem, we note that it can be written as a proximal operator:

[TABLE]

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Philipp Berens, Jeremy Freeman, Thomas Deneux, Nikolay Chenkov, Thomas Mc Colgan, Artur Speiser, Jakob H. Macke, Srinivas C. Turaga, Patrick Mineault, Peter Rupprecht, Stephan Gerhard, Rainer W. Friedrich, Johannes Friedrich, Liam Paninski, Marius Pachitariu, Kenneth D. Harris, Ben Bolte, Timothy A. Machado, Dario Ringach, Jasmine Stone, Luke E. Rogerson, Nicolas J. Sofroniew, Jacob Reimer, Emmanouil Froudarakis, Thomas Euler, Miroslav Román Rosón, Lucas Theis, Andreas S. Tolias, and Matth
2[2] R. James Cotton, Emmanouil Froudarakis, Patrick Storer, Peter Saggau, and Andreas Tolias, “Three-dimensional mapping of microcircuit correlation structure,” Frontiers in Neural Circuits , vol. 7, pp. 151, 2013, DOI: 10.3389/fncir.2013.00151 .
3[3] Ildefons Magrans de Abril, Junichiro Yoshimoto, and Kenji Doya, “Connectivity inference from neural recording data: Challenges, mathematical bases and research directions,” Neural Networks , vol. 102, pp. 120 – 137, 2018, DOI: 10.1016/j.neunet.2018.02.016 .
4[4] Antonio Sutera, Arnaud Joly, Vincent François-Lavet, Zixiao Aaron Qiu, Gilles Louppe, Damien Ernst, and Pierre Geurts, “Simple connectome inference from partial correlation statistics in calcium imaging,” in Proceedings of the 2014 th International Conference on Neural Connectomics - Volume 46 . 2014, pp. 23–35, JMLR.org, URL: http://proceedings.mlr.press/v 46/sutera 15.pdf .
5[5] Steffen L Lauritzen, Graphical models , vol. 17, Clarendon Press, 1996.
6[6] Xiaolong Jiang, Shan Shen, Cathryn R. Cadwell, Philipp Berens, Fabian Sinz, Alexander S. Ecker, Saumil Patel, and Andreas S. Tolias, “Principles of connectivity among morphologically defined cell types in adult neocortex,” Science , vol. 350, no. 6264, 2015, DOI: 10.1126/science.aac 9462 .
7[7] Ed Bullmore and Olaf Sporns, “Complex brain networks: graph theoretical analysis of structural and functional systems,” Nature Reviews Neuroscience , vol. 10, pp. 186–198, 02 2009, DOI: 10.1038/nrn 2575 .
8[8] A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” ACM Comput. Surv. , vol. 31, no. 3, pp. 264–323, Sept. 1999, DOI: 10.1145/331499.331504 .