Extraction of hierarchical functional connectivity components in human   brain using resting-state fMRI

Dushyant Sahoo; Theodore D. Satterthwaite; Christos Davatzikos

arXiv:1906.08365·q-bio.NC·March 2, 2021

Extraction of hierarchical functional connectivity components in human brain using resting-state fMRI

Dushyant Sahoo, Theodore D. Satterthwaite, Christos Davatzikos

PDF

TL;DR

This paper introduces a novel hierarchical sparse connectivity pattern extraction method from resting-state fMRI data, revealing multi-scale brain organization and improving reproducibility over existing approaches.

Contribution

A new hierarchical decomposition method for fMRI correlation matrices using deep factorization and non-convex optimization, enhancing interpretability and reproducibility of brain connectivity patterns.

Findings

01

Hierarchical SCPs are reproducible across datasets.

02

Multi-scale patterns outperform single-scale in reproducibility.

03

Method provides new insights into brain functional organization.

Abstract

The study of hierarchy in networks of the human brain has been of significant interest among the researchers as numerous studies have pointed out towards a functional hierarchical organization of the human brain. This paper provides a novel method for the extraction of hierarchical connectivity components in the human brain using resting-state fMRI. The method builds upon prior work of Sparse Connectivity Patterns (SCPs) by introducing a hierarchy of sparse overlapping patterns. The components are estimated by deep factorization of correlation matrices generated from fMRI. The goal of the paper is to extract interpretable hierarchical patterns using correlation matrices where a low rank decomposition is formed by a linear combination of a high rank decomposition. We formulate the decomposition as a non-convex optimization problem and solve it using gradient descent algorithms with…

Tables5

Table 1. TABLE I : Similarity comparison (mean ± plus-or-minus \pm std) on simulated dataset. The rows correspond to values of k 1 subscript 𝑘 1 k_{1} and the columns correspond to values of k 2 subscript 𝑘 2 k_{2} .

		$10$	$15$	$20$
$5$	hSCP	$0.8293 \pm 0.0467$	$0.8097 \pm 0.0728$	$0.8305 \pm 0.0614$
	EAGLE	$0.4051 \pm 0.0304$	$0.4180 \pm 0.0290$	$0.4068 \pm 0.0070$
	OSLOM	$0.6866 \pm 0.0442$	$0.6955 \pm 0.0362$	-
$6$	hSCP	$0.8421 \pm 0.0585$	$0.8660 \pm 0.0286$	$0.8497 \pm 0.0292$
	EAGLE	$0.3867 \pm 0.0141$	$0.4855 \pm 0.0731$	$0.4463 \pm 0.0334$
	OSLOM	$0.6249 \pm 0.0554$	$0.7302 \pm 0.0431$	-
$8$	hSCP	$0.8350 \pm 0.0666$	$0.8457 \pm 0.0353$	$0.8454 \pm 0.0385$
	EAGLE	$0.4408 \pm 0.0857$	$0.5339 \pm 0.0900$	$0.4099 \pm 0.0274$
	OSLOM	$0.6610 \pm 0.0540$	-	-

Table 2. TABLE II : Reproducibility comparison (mean ± plus-or-minus \pm std) on HCP dataset. The rows correspond to values of k 1 subscript 𝑘 1 k_{1} and the columns correspond to values of k 2 subscript 𝑘 2 k_{2} .

		$10$	$15$	$20$
$4$	hSCP	$0.8885 \pm 0.0441$	$0.8351 \pm 0.0748$	$0.8507 \pm 0.0635$
	EAGLE	$0.3077 \pm 0.0981$	$0.4158 \pm 0.1321$	-
	OSLOM	$0.7493 \pm 0.0882$	-	-
$5$	hSCP	$0.8753 \pm 0.0348$	$0.8356 \pm 0.0591$	$0.8281 \pm 0.0656$
	EAGLE	$0.2908 \pm 0.0737$	$0.2664 \pm 0.0333$	$0.0792 \pm 0.1656$
	OSLOM	$0.6092 \pm 0.0733$	-	-
$6$	hSCP	$0.8756 \pm 0.0375$	$0.8461 \pm 0.0486$	$0.8224 \pm 0.0555$
	EAGLE	$0.2356 \pm 0.0196$	$0.3209 \pm 0.1206$	$0.3717 \pm 0.1698$
	OSLOM	$0.5791 \pm 0.0792$	-	-
$8$	hSCP	$0.8781 \pm 0.0694$	$0.8389 \pm 0.0479$	$0.8240 \pm 0.0460$
	EAGLE	-	-	$0.3374 \pm 0.1672$
	OSLOM	-	-	-

Table 3. TABLE III : Reproducibility comparison (mean ± plus-or-minus \pm std) on PNC dataset. The rows correspond to values of k 1 subscript 𝑘 1 k_{1} and the columns correspond to values of k 2 subscript 𝑘 2 k_{2} .

		$10$	$15$	$20$
$4$	$h S C P$	$0.8838 \pm 0.0495$	$0.7998 \pm 0.0766$	$0.8036 \pm 0.0599$
	EAGLE	$0.6287 \pm 0.3005$	$0.6433 \pm 0.1321$	$0.6046 \pm 0.2981$
	OSLOM	$0.6780 \pm 0.0537$	-	-
$5$	hSCP	$0.8785 \pm 0.0675$	$0.8379 \pm 0.0704$	$0.8099 \pm 0.0736$
	EAGLE	$0.6575 \pm 0.1973$	$0.5327 \pm 0.1828$	$0.5426 \pm 0.1656$
	OSLOM	$0.5867 \pm 0.0869$	-	-
$6$	hSCP	$0.8655 \pm 0.0404$	$0.8364 \pm 0.0649$	$0.8518 \pm 0.0587$
	EAGLE	$0.7571 \pm 0.2366$	$0.6279 \pm 0.1011$	$0.6244 \pm 0.2627$
	OSLOM	$0.6391 \pm 0.1266$	-	-
$8$	hSCP	$0.8670 \pm 0.0559$	$0.8347 \pm 0.0517$	$0.8340 \pm 0.0657$
	EAGLE	-	$0.7451 \pm 0.0319$	$0.5933 \pm 0.2126$
	OSLOM	$0.5479 \pm 0.0987$	-	-

Table 4. TABLE IV : Prediction performance comparison of hSCP, EAGLE and OSLOM

		Correlation				MAE (years)
		$10$	$15$	$20$	$25$	$10$	$15$	$20$	$25$
$4$	hSCP	$0.259 \pm 0.010$	$0.301 \pm 0.014$	$0.319 \pm 0.012$	$0.377 \pm 0.010$	$3.20 \pm 0.06$	$3.16 \pm 0.10$	$3.13 \pm 0.09$	$3.06 \pm 0.14$
	EAGLE	$0.246 \pm 0.004$	$0.298 \pm 0.009$	$0.300 \pm 0.004$	$0.347 \pm 0.005$	$3.22 \pm 0.01$	$3.20 \pm 0.01$	$3.15 \pm 0.01$	$3.10 \pm 0.01$
	OSLOM	$0.209 \pm 0.007$	-	-	-	$3.25 \pm 0.01$	-	-	-
$5$	hSCP	$0.263 \pm 0.010$	$0.327 \pm 0.025$	$0.379 \pm 0.028$	$0.403 \pm 0.017$	$3.19 \pm 0.07$	$3.10 \pm 0.17$	$3.06 \pm 0.21$	$3.03 \pm 0.13$
	EAGLE	$0.259 \pm 0.003$	$0.298 \pm 0.002$	$0.301 \pm 0.006$	-	$3.21 \pm 0.01$	$3.13 \pm 0.01$	$3.09 \pm 0.01$	-
	OSLOM	$0.217 \pm 0.005$	-	-	-	$3.24 \pm 0.01$	-	-	-
$6$	hSCP	$0.257 \pm 0.013$	$0.342 \pm 0.027$	$0.381 \pm 0.021$	$0.407 \pm 0.022$	$3.20 \pm 0.08$	$3.11 \pm 0.18$	$3.06 \pm 0.16$	$3.02 \pm 0.18$
	EAGLE	$0.281 \pm 0.004$	$0.308 \pm 0.005$	$0.321 \pm 0.007$	-	$3.18 \pm 0.01$	$3.15 \pm 0.01$	$3.14 \pm 0.01$	-
	OSLOM	$0.236 \pm 0.008$	-	-	-	$3.20 \pm 0.01$	-	-	-
$8$	hSCP	$0.278 \pm 0.022$	$0.372 \pm 0.026$	$0.382 \pm 0.023$	$0.409 \pm 0.010$	$3.18 \pm 0.14$	$3.07 \pm 0.18$	$3.05 \pm 0.17$	$3.02 \pm 0.13$
	EAGLE	-	$0.311 \pm 0.003$	$0.326 \pm 0.007$	-	-	$3.17 \pm 0.01$	$3.13 \pm 0.02$	-
	OSLOM	$0.264 \pm 0.007$	-	-	-	$3.20 \pm 0.01$	-	-	-

Table 5. TABLE V : p-value from Wilcoxon signed-rank test on different performance measures

		Correlation				MAE (years)
		$10$	$15$	$20$	$25$	$10$	$15$	$20$	$25$
$4$	EAGLE	$0.0034$	$0.0229$	$3.6 \times 10^{- 4}$	$4.7 \times 10^{- 5}$	$0.0159$	$0.0108$	$0.0323$	$0.0351$
$4$	OSLOM	$4.7 \times 10^{- 5}$	-	-	-	$0.0012$	-	-	-
$5$	EAGLE	$0.0447$	$1.1 \times 10^{- 4}$	$4.7 \times 10^{- 5}$	-	$0.0447$	$0.1198$	$0.0108$	-
$5$	OSLOM	$4.7 \times 10^{- 5}$	-	-	-	$0.0413$	-	-	-
$6$	EAGLE	$1$	$6.2 \times 10^{- 4}$	$4.7 \times 10^{- 5}$	-	$0.9727$	$0.0653$	$0.0145$	-
$6$	OSLOM	$1.3 \times 10^{- 4}$	-	-	-	$0.778$	-	-	-
$8$	EAGLE	-	$4.7 \times 10^{- 5}$	$4.7 \times 10^{- 5}$	-	-	$0.0447$	$0.0209$	-
$8$	OSLOM	$0.0175$	-	-	-	$0.0563$	-	-	-

Equations36

W, Λ minimize

W, Λ minimize

∥ w_{l} ∥_{1} \leq λ, l = 1, ..., k

∥ w_{l} ∥_{\infty} \leq 1, l = 1, ..., k

Λ^{i} ⪰ 0, i = 1, ..., S

Θ^{i} Θ^{i} . Θ^{i} \approx W_{1} Λ_{1}^{i} W_{1}^{T} \approx W_{1} W_{2} Λ_{2}^{i} W_{2}^{T} W_{1}^{T} \approx W_{1} W_{2} .. W_{K} Λ_{K}^{i} W_{K}^{T} W_{K - 1}^{T} .. W_{1}^{T}

Θ^{i} Θ^{i} . Θ^{i} \approx W_{1} Λ_{1}^{i} W_{1}^{T} \approx W_{1} W_{2} Λ_{2}^{i} W_{2}^{T} W_{1}^{T} \approx W_{1} W_{2} .. W_{K} Λ_{K}^{i} W_{K}^{T} W_{K - 1}^{T} .. W_{1}^{T}

W_{1}, Λ_{1} min W_{1}, W_{2}, Λ_{2} min . W, Λ_{K} min i = 1 \sum S ∥ Θ^{i} - W_{1} Λ_{1}^{i} W_{1}^{T} ∥_{F}^{2} i = 1 \sum S ∥ Θ^{i} - W_{1} W_{2} Λ_{2}^{i} W_{2}^{T} W_{1}^{T} ∥_{F}^{2} i = 1 \sum S ∥ Θ^{i} - W_{1} W_{2} .. W_{K} Λ_{K}^{i} W_{K}^{T} W_{K - 1}^{T} .. W_{1}^{T} ∥_{F}^{2}

W_{1}, Λ_{1} min W_{1}, W_{2}, Λ_{2} min . W, Λ_{K} min i = 1 \sum S ∥ Θ^{i} - W_{1} Λ_{1}^{i} W_{1}^{T} ∥_{F}^{2} i = 1 \sum S ∥ Θ^{i} - W_{1} W_{2} Λ_{2}^{i} W_{2}^{T} W_{1}^{T} ∥_{F}^{2} i = 1 \sum S ∥ Θ^{i} - W_{1} W_{2} .. W_{K} Λ_{K}^{i} W_{K}^{T} W_{K - 1}^{T} .. W_{1}^{T} ∥_{F}^{2}

W, L minimize

W, L minimize

∥ w_{l}^{r} ∥_{1} < λ_{r}, l = 1, ..., k_{r} and r = 1, .., K

∥ w_{l}^{r} ∥_{\infty} \leq 1, l = 1, ..., k_{r} and r = 1, .., K

W_{j} \geq 0, j = 2, ..., K

Λ_{r}^{i} ⪰ 0, i = 1, ..., S and r = 1, .., K

trace (Λ_{r}^{i}) = 1, i = 1, ..., S and r = 1, .., K

W_{0} = I_{P}

W_{0} = I_{P}

Y_{r} = Π_{j = 0}^{r} W_{j}

T_{n, i}^{r} = (Π_{j = 1}^{n - r} W_{j}) Λ_{n - r}^{i} (Π_{j = 1}^{n - r} W_{j})^{T}

\frac{\partial H}{\partial Λ _{r}^{i}}

\frac{\partial H}{\partial Λ _{r}^{i}}

\frac{\partial H}{\partial W _{r}}

\frac{\partial H}{\partial W _{r}}

+ 4 Y_{r - 1}^{T} Y_{r - 1} W_{r} T_{j, i}^{r} W_{r}^{T} Y_{r - 1}^{T} Y_{r - 1} W_{r} T_{j, i}^{r}

\frac{\sum _{i = 1}^{S} \sum _{r = 1}^{K} ∣∣ Θ ^{i} - ( Π _{j = 1}^{r} W _{j} ) Λ _{r}^{i} ( Π _{n = 1}^{r} W _{n} ) ^{T} ∣ ∣ _{F}^{2}}{\sum _{i = 1}^{S} \sum _{r = 1}^{K} ∣∣ Θ ^{i} ∣ ∣ _{F}^{2}}

\frac{\sum _{i = 1}^{S} \sum _{r = 1}^{K} ∣∣ Θ ^{i} - ( Π _{j = 1}^{r} W _{j} ) Λ _{r}^{i} ( Π _{n = 1}^{r} W _{n} ) ^{T} ∣ ∣ _{F}^{2}}{\sum _{i = 1}^{S} \sum _{r = 1}^{K} ∣∣ Θ ^{i} ∣ ∣ _{F}^{2}}

Θ^{i} = W_{1} W_{2} Λ^{i} W_{2}^{T} W_{1}^{T} + E_{i}

Θ^{i} = W_{1} W_{2} Λ^{i} W_{2}^{T} W_{1}^{T} + E_{i}

W, L, C minimize

W, L, C minimize

∥ w_{l}^{r} ∥_{1} < λ_{r}, l = 1, ..., k_{r} and r = 1, .., K

∥ w_{l}^{r} ∥_{\infty} \leq 1, l = 1, ..., k_{r} and r = 1, .., K

W_{j} \geq 0, j = 2, ..., K

Λ_{r}^{i} ⪰ 0, i = 1, ..., S and r = 1, .., K

trace (Λ_{r}^{i}) = 1, i = 1, ..., S and r = 1, .., K

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Hierarchical extraction of functional connectivity components in human brain using resting-state fMRI

Dushyant Sahoo, Theodore D. Satterthwaite, Christos Davatzikos Dushyant Sahoo and Christos Davatzikos are with Department of Electrical and Systems Engineering University of Pennsylvania, PA, USA (e-mail: [email protected]; [email protected])Theodore D. Satterthwaite is with Department of Psychiatry, University of Pennsylvania, PA, USA (email: [email protected] )

Abstract

The study of functional networks of the human brain has been of significant interest in cognitive neuroscience for over two decades, albeit they are typically extracted at a single scale using various methods, including decompositions like ICA. However, since numerous studies have suggested that the functional organization of the brain is hierarchical, analogous decompositions might better capture functional connectivity patterns. Moreover, hierarchical decompositions can efficiently reduce the very high dimensionality of functional connectivity data. This paper provides a novel method for the extraction of hierarchical connectivity components in the human brain using resting-state fMRI. The method builds upon prior work of Sparse Connectivity Patterns (SCPs) by introducing a hierarchy of sparse, potentially overlapping patterns. The components are estimated by cascaded factorization of correlation matrices generated from fMRI. The goal of the paper is to extract sparse interpretable hierarchically-organized patterns using correlation matrices where a low rank decomposition is formed by a linear combination of a higher rank decomposition. We formulate the decomposition as a non-convex optimization problem and solve it using gradient descent algorithms with adaptive step size. Along with the hierarchy, our method aims to capture the heterogeneity of the set of common patterns across individuals. We first validate our model through simulated experiments. We then demonstrate the effectiveness of the developed method on two different real-world datasets by showing that multi-scale hierarchical SCPs are reproducible between sub-samples and are more reproducible as compared to single scale patterns. We also compare our method with an existing hierarchical community detection approach.

Index Terms:

Connectivity analysis, Matrix factorization, Hierarchical decomposition, fMRI

I Introduction

It has been known that the human brain consists of spatially different regions which are functionally connected to form networks [1]. In addition, these networks are thought to be hierarchically organized in the brain [2, 3, 4, 5]. However, our understanding of the hierarchical nature of these networks is limited due to their complex nature. Most of the commonly used methods, such as Independent Component Analysis (ICA) [6], Sparse Dictionary Learning (DL) [7] and graph theory based network analysis [8], for analysis of functional networks are focused on estimating a fixed number of networks with no hierarchy. If the assumption about the hierarchy is true then the original data might contain complex hierarchical information with implicit lower-level hidden attributes, that classical one level connectivity methodologies would not be able to capture effectively and interpretably. Notably, existing methods extract different number of components but can not describe relationships between the components.

Most of the methods used for estimation of hierarchical networks in fMRI data analysis are of agglomerative (“bottom-up”) type such as Hierarchical clustering [9, 10], Hierarchical Community Detection [11] where the method begins by regarding each element as a separate network and then merging them into larger networks successively. Most of the hierarchical community detection approaches assume that the communities are independent [12, 13, 14] where they have investigated multi-scale brain networks and conducted multi-scale community detection by manipulating the number of communities. But, this is not the case in the human brain where it is known the certain brain regions interact with multiple networks i.e., the networks overlap [15]. Relatively few hierarchical community detection methods [16, 17, 18, 19] have been developed which find overlapping communities. Moreover, in community detection approaches, negative edge links are treated as repulsion. Previously, most approaches have used thresholds before their analysis and estimated networks by using sparse graphs. The reason for thresholding was that the strong edges contain most relevant information leading to the removal of negative edges. In contrast, in resting fMRI, a negative edge link carries essential information on functional co-variation with the opposing phase [20] and has a substantial physiological basis [21, 22]. These relations may play an important role in neuropsychiatric disorders and cognitive differentiation [23]. Some studies have recently shown that the weak network edges contain unique information that can not be revealed by analysis of just strong edges [24, 25]. Assigning anti-correlated and correlated regions to the same component can reveal more details about the organization of the human brain patterns [26, 27, 28], as long as interpreted correctly.

Non-negative Matrix Factorization (NMF) [29] is one common matrix decomposition approach which many researchers use for obtaining information about community structure by analyzing low dimensional matrix. NMF has been used to find hierarchical structure [30], recently, [31] used Deep Semi Non-negative Matrix Factorization [32] for estimating hierarchical, potentially overlapping, functional networks. The model given by [31] could only find networks containing regions with positive correlation between them as the method is based on non-negative matrix factorization thus limiting the model to only use positive matrices.

Our work addresses aforementioned limitations by modeling the fMRI data to capture essential properties of the network, namely- 1) Sparsity: only a small subset of nodes interact with other nodes in a given network; 2) Heterogeneity: some networks might be more prominent in particular individuals as compared to others; 3) Existence of positively and negatively correlated nodes in a network; 4) Overlapping networks, which is likely to reflect true brain organization, as brain networks might share certain regional components; and 5) Hierarchy: By adding extra layers of abstraction we can learn latent attributes and the hierarchy in the networks. Our method is built upon Sparse Connectivity Patterns (SCPs) [26] which can be considered a symmetric CP decomposition for which an indirect fitting procedure makes the model structure equivalent to the PARAFAC2 model representation considered in [33] with the addition of sparsity rather than orthogonality. Our method aims to find Hierarchical Sparse Connectivity Patterns (hSCPs) by jointly decomposing correlation matrices into multiple components having different ranks using a cascaded framework for matrix factorization. We use gradient descent with adaptive step size for solving non-convex optimization, and have also introduced an initialization algorithm for making algorithm deterministic and faster. We evaluate the representation learned by the model on two different real datasets and compare it with EAGLE [17] and OSLOM [18] which are well known hierarchical overlapping community detection algorithms. We also provide an extension of the model for clustering the data using hSCPs which could help in understanding of hSCPs and its distribution.

The organization of the remainder of the paper as follows. In Section 2, we present the method for the extraction of hSCPs shared between rs-fMRI scans. Section 3 presents experimental results for validation of the method on simulated datasets and the effectiveness on the rs-fMRI scans of the 100 unrelated HCP subjects [34] and 969 subjects from the Philadelphia Neurodevelopmental Cohort (PNC) data set [35]. We conclude with a discussion.

II Method

II-A Sparse Connectivity Patterns

Let $\mathbf{X}^{i}\in\mathbb{R}^{P\times T}$ be the fMRI data of the $i^{th}$ subject having $P$ regions and $T$ time points, and $\mathbf{\Theta}^{i}\in\mathbb{S}^{P\times P}_{++}$ is the correlation matrix where $\mathbf{\Theta}^{i}_{m,o}=\mathop{\rm corr}\nolimits(\mathbf{x}^{i}_{m},\mathbf{x}^{i}_{o})$ is the correlation between time series of $m^{th}$ and $o^{th}$ node. We first define the model for estimating the Sparse Connectivity Patterns (SCPs) [26] in the fMRI data which decomposes the correlation matrices into non-negative linear combination of sparse low rank components such that for all $i=1,..,S$ we have $\mathbf{\Theta}^{i}=\mathbf{W}\mathbf{\Lambda}^{i}\mathbf{W}^{T}$ where $\mathbf{W}\in\mathbb{R}^{P\times k}$ is a set of shared patterns across all subjects, $k<P$ and $\mathbf{\Lambda}^{i}\succeq 0$ is a diagonal matrix storing the subject specific information about the strength of each of the components. Let $\mathbf{w}_{l}\in\mathbb{R}^{P}$ be the $l^{th}$ column of $\mathbf{W}$ such that $-1\preceq\mathbf{w}_{l}\preceq 1$ and let $w_{l,s}$ be the $s^{th}$ element of $\mathbf{w}_{l}$ vector, then $\mathbf{w}_{l}$ represents a component which reflects the weights of the nodes in the component and if $w_{l,s}$ is zero then $s^{th}$ node does not belong to $l^{th}$ component. If the sign of weights of any two nodes in a component is same then they are positively correlated else they have anti-correlation. To make the patterns sparse, each column of $\mathbf{W}$ was subjected to $L_{1}$ penalty and the below optimization is solved to obtain the SCPs

[TABLE]

where $S$ is the total number of subjects and $\lambda$ controls the sparsity of the components.

II-B Hierarchical Sparse Connectivity Patterns

We have extended the above work and introduced Hierarchical Sparse Connectivity Patterns (hSCPs) to estimate hierarchical sparse low rank patterns in the correlation matrices. In our model, a correlation matrix is decomposed into $K$ levels as -

[TABLE]

where $\mathbf{W}_{1}\in\mathbb{R}^{P\times k_{1}}$ and $\mathbf{W}_{q}\in\mathbb{R}^{k_{q-1}\times k_{q}}$ , $\mathbf{\Lambda}_{q}^{i}\in\mathbb{R}^{k_{q}\times k_{q}}$ is a diagonal matrix storing subject specific information of the patterns, $P\gg k_{1}>k_{2}>...>k_{K}$ , $P\gg K$ and $\mathbf{W}^{T}$ is the transpose of $\mathbf{W}$ . Here $k_{r}$ is the number of components at the $r^{th}$ level, note that $k_{1}$ is the number of components at the lower most level of the hierarchy. If we consider 2 layer hierarchical representation of a given correlation matrix then we can define $\mathbf{Z}_{1}=\mathbf{W}_{1}\mathbf{W}_{2}$ to be a $P\times k_{2}$ matrix, then $\mathbf{Z}_{1}$ is a coarse network which consist of weighted linear combination of $\mathbf{W}_{1}$ which are fine level components where weights are stored in $\mathbf{W}_{2}$ .

For better interpretability, for noise reduction in the model, but also because of our hypothesis that brain subnetworks are relatively sparse [36], we have introduced sparsity constraints on the $\mathbf{W}$ matrices. By making $\mathbf{W}_{1}$ sparse we are forcing the components to contain few number of nodes and by forcing rest of the $\mathbf{W}$ s to be sparse, we are forcing that the components at each of the next level are sparse linear combination of previous components. The hierarchical networks can be estimated by solving the below minimization procedures simultaneously under the constraints mentioned above

[TABLE]

where $\mathcal{W}=\{\mathbf{W}_{1},..,\mathbf{W}_{K}\}$ . As the above minimization procedures are inter-dependent, we need to solve them jointly. Let $\mathcal{L}=\{\mathbf{\Lambda}_{1},..,\mathbf{\Lambda}_{K}\}$ and $H(\mathcal{W},\mathcal{L})=\sum_{i=1}^{S}\sum_{r=1}^{K}||\mathbf{\Theta}^{i}-(\Pi_{j=1}^{r}\mathbf{W}_{j})\mathbf{\Lambda}_{r}^{i}(\Pi_{n=1}^{r}\mathbf{W}_{n})^{T}||_{F}^{2}$ . The joint minimization problem can be written as below

[TABLE]

where $\mathop{\rm trace}\nolimits$ operator calculates sum of diagonal elements of a matrix. In the above minimization procedure, the sum of diagonal values of $\Lambda^{i}$ is fixed to be $1$ such that the sparsity of $W$ is not trivially minimized. The optimization problem defined in 4 is a non-convex problem which we solved using alternating minimization. Below are the gradients of $H$ with respect to $\mathcal{W}$ and $\mathcal{L}$ . Let us first define the following variables

[TABLE]

the gradient of $H$ with respect to $\mathbf{\Lambda}_{r}^{i}$ is:

[TABLE]

where $\circ$ is entry-wise product. The gradient of $H$ with respect to $\mathbf{W}_{l}$ is written as:

[TABLE]

Algorithm 1 describes the complete alternating minimization procedure where $\mathop{\rm proj}\nolimits_{1}(\mathbf{W},\lambda)$ operator projects each column of $\mathbf{W}$ into intersection of $L_{1}$ and $L_{\infty}$ ball [37], and $\mathop{\rm proj}\nolimits_{2}$ projects a matrix onto $\mathbb{R}_{+}$ by making all the negative elements in the matrix equal to zero. As the gradients are not globally Lipschitzs, we don’t have bounds on the step size for the gradients. For that reason, we have used AMSGrad [38], ADAM [39] and NADAM [40] as gradient descent algorithms which have adaptive step size. $\mathop{\rm descent}\nolimits$ function in the Algorithm 1 is the update rule used by different gradient descent techniques. All the code is implemented in matlab and will be released upon publication. The cost of computing gradients of $\mathbf{\Lambda}$ is $\mathcal{O}(KSP^{2}k_{1})$ and of $\mathbf{W}$ is $\mathcal{O}(KSP^{2}k_{1}+K^{2}SPk_{1}^{2})$ . The overall cost of Algorithm 1 is number of iterations $\times\mathcal{O}(KSP^{2}k_{1}+K^{2}SPk_{1}^{2})$ . From our previous assumption that $P\gg K$ , the final cost is number of iterations $\times\mathcal{O}(KSP^{2}k_{1})$ .

In the above formulation, the last level has the highest number of components $k_{1}$ , and in the level after that we have $k_{2}$ number of components which are linear combination of components at previous level, so on and so forth. In this way, we have built up a hierarchical model where each component is made up of linear combination of components at the previous hierarchy. Note that we can not just use the last decomposition in the above architecture to get the hierarchy as different layers have different ranks and different approximations, hence we will need all the approximations to build the hierarchical structure. In addition, one would expect $\mathbf{W}_{2}$ and $\mathbf{W}$ s to be degenerate, but that would be the case only when $\mathbf{W}_{1}$ is orthogonal matrix. Consider the case where we have a two level hierarchy, we can have better approximation by taking a linear combination of columns of $\textbf{W}_{1}$ which we have also observed empirically.

II-C Initialization procedure for Gradient Descent

Single level matrix decomposition considered in hSCP is structurally similar to Singular Value Decomposition (SVD) but with the dependent components and sparsity added. Hence, we believe that the final components estimated are a modification of singular vectors. Thus, we have initialized the $\mathcal{W}$ and $\mathcal{L}$ in Algorithm $1$ by taking SVD of input data matrix. This helps in making algorithm deterministic. Define $\bar{\Theta}$ as the sample mean of $\Theta_{i}$ . We then do k-rank SVD of $\bar{\Theta}$ and obtain $U$ and $S$ such that $UVU^{T}=$ k-rank SVD of $\bar{\Theta}$ . We then initialize $W_{1}$ by $U$ and $\Lambda_{1}^{i}$ by $V^{i}$ where $V^{i}$ can be obtained by taking k-rank SVD of $\Theta_{i}$ as described in Algorithm 2. For $r>1$ , $W_{r}$ can be initialized as a permutation matrix and $\Lambda_{r}$ by top $k_{r}$ diagonal elements of $k_{r-1}$ so that we don’t have to perform SVD at each level. We empirically show in the next section that SVD initialization results in faster convergence.

III Experiments

III-A Dataset

We used two real dataset for demonstrating the effectiveness of the method which are described below

•

HCP- Human Connectome Project (HCP) [34] dataset is one of the widely used dataset for fMRI analysis containing fMRI scans of $100$ unrelated subjects as provided at the HCP $900$ subjects data release [41] which were processed using ICA+FIX pipeline with MSMAll registration [42]. Each subject has $4004$ time points and the time series were normalized to zero mean and unit L2 norm, averaged over the 360 nodes of the multimodal HCP parcellation [43].

•

PNC- Philadelphia Neuro-developmental Cohort (PNC) [35] dataset contains $969$ subjects (ages from $8$ to $22$ ) each having $120$ time points and $121$ nodes described in [44]. The data were preprocessed using an optimized procedure [45] which includes slice timing, confound regression, and band-pass filtering.

III-B Convergence Analysis

We compare AMSGrad, ADAM, NADAM and vanilla gradient descent with SVD initialization and random initialization by measuring percentage error which is defined as:

[TABLE]

For fair comparison, we set $\beta_{1}=0.9$ and $\beta_{2}=0.99$ for ADAM, NADAM and AMSGRAD algorithm, where $\beta_{1}$ and $\beta_{2}$ are the hyperparameters used in the update rules of the gradient descent algorithms. These are values are typically used as parameter settings for adaptive gradient descent algorithms [38]. Figure 1 shows the convergence of the algorithm on the complete HCP data for two different combinations of sparsity parameters at a particular set of $k_{1}$ and $k_{2}$ . From the Figure 1 we can see that the AMSGrad has the best convergence and SVD initialization gives a better convergence rate. For rest of the experiments we have used AMSGrad algorithm with SVD initialization to perform gradient descent.

III-C Simulation

To evaluate the performance of the proposed model, we first use synthetic data. We compared the hierarchical components extracted from hSCP to hierarchical overlapping communities obtained using EAGLE [17] and OSLOM [18]. Implementation of EAGLE and OSLOM was obtained from the authors.

We randomly generate $\mathbf{V}_{1}\in\mathbb{R}^{p\times k_{1}}$ with percentage of non-zeros equal to $\mu_{1}$ , $\mathbf{W}_{2}\in\mathbb{R}^{k_{1}\times k_{2}}$ with percentage of non-zeros equal to $\mu_{2}$ and $\mathbf{\Lambda}^{i}\in\mathbb{R}^{k_{2}\times k_{2}}$ for $i=1,..,n$ . The goal is to generate $\mathbf{V}_{1}\mathbf{W}_{2}\mathbf{\Lambda}^{i}\mathbf{W}_{2}^{T}\mathbf{V}_{1}^{T}$ matrices which are close to a correlation matrix. For this, we first take mean of all $\mathbf{\Lambda}^{i}$ such that $\mathbf{U}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{\Lambda}^{i}$ and generate $\mathbf{T}$ such that $\mathbf{T}=\mathbf{V}_{1}\mathbf{W}_{2}\mathbf{U}\mathbf{W}_{2}^{T}\mathbf{V}_{1}^{T}$ . Now, let $\mathbf{D}$ be a matrix containing diagonal elements of $\mathbf{T}$ , to make $\mathbf{T}$ a correlation matrix, we modify $\mathbf{V}_{1}$ by multiplying it by $\mathbf{D}^{\frac{1}{2}}$ . Let $\mathbf{W}_{1}=\mathbf{D}^{\frac{1}{2}}\mathbf{V}_{1}$ , then $\mathbf{R}=\mathbf{W}_{1}\mathbf{W}_{2}\mathbf{U}\mathbf{W}_{2}^{T}\mathbf{W}_{1}^{T}$ would be a correlation matrix. Now, we generate correlation matrix for each subject by using the below equation

[TABLE]

As $\mathbf{W}_{1}\mathbf{W}_{2}\mathbf{\Lambda}^{i}\mathbf{W}_{2}^{T}\mathbf{W}_{1}^{T}$ matrix is close to a correlation but not a correlation matrix, we add $\mathbf{E}_{i}\in\mathbb{R}^{p\times p}$ such that it becomes a correlation matrix. For the experiments, the parameters were set as follows: $n=300$ , $p=100$ $k_{1}=20$ , $k_{2}=10$ , $\mu_{1}=0.4$ and $\mu_{2}=0.5$ .

We compare components derived from hSCP with $k_{1}\in\{10,15,20\}$ , $k_{2}\in\{5,6,8\}$ , $\lambda_{1}\in P\times 5(10^{[-3:-1]})$ and $\lambda_{2}\in k_{1}\times 10^{[-3:-1]}$ . By varying $\lambda$ values, we generate components with different sparsity. We first compare fine-scale and coarse-scale components separately to demonstrate the effect of sparsity on the performance. For a fixed $k_{1}$ and $\lambda_{1}$ , we find $k_{2}$ and $\lambda_{2}$ giving the maximum similarity with the ground truth and for a fixed $k_{2}$ and $\lambda_{2}$ , we find $k_{1}$ and $\lambda_{1}$ giving the maximum similarity with the ground truth over $10$ runs. Here the similarity is defined as the average correlation between extracted and the ground truth components. Fig. 2 shows the similarity of the fine-scale and coarse-scale components with the ground truth. From the figure, we can see that the hSCP can extract components that are highly similar to the ground truth. Also, as the fine-scale components become sparse, the similarity decreases. Next, we compare hSCP to EAGLE and OSLOM. Hierarchical components and communities with $k_{1}\in\{5,6,8\}$ , $k_{2}\in\{10,15,20\}$ were extracted from hSCP, EAGLE and OSLOM. The correlation matrix averaged across all the subjects was used as an input to EAGLE and OSLOM. For hSCP, among different values of $\lambda_{1}$ and $\lambda_{2}$ , we extract components at level $k_{1}$ and $k_{2}$ , which have maximum similarity with the ground truth. Table I shows the similarity of the extracted components with the ground truth. Some cells in the tables are empty as the EAGLE and OSLOM algorithms were not able to generate hierarchical structures for particular values of $k_{1}$ and $k_{2}$ . It can be seen that the hSCP method can extract the components which are closer to ground truth as compared to other methods.

III-D Comparison with single scale components

We also compared the reproducibility of the shared components extracted from the hierarchical model (hSCP) versus single scale components (SCP). Reproducibility here is defined as the normalized inner product of components derived from the two equal random sub-samples of the data averaged across all the components. We decomposed the correlation matrix into two levels to demonstrate the advantages of hierarchical factorization and show proof of concept. There might not be a single $K$ that best describes the data, and the algorithm allows us to investigate the continuum of functional connectivity patterns at different $K$ s. We compare components derived from hSCP with $k_{1}\in\{10,15,20,25\}$ , $k_{2}\in\{4,5,6,8\}$ , $\lambda_{1}\in P\times 5(10^{[-3:-1]})$ and $\lambda_{2}\in k_{1}\times 10^{[-3:-1]}$ , and from SCP with $k\in\{4,5,6,8,10,15,20,25\}$ at $\lambda\in P\times 5(10^{[-4:-1]})$ . At a fixed $k_{2}$ and $\lambda_{2}$ , we find the optimal $k_{1}$ and $\lambda_{1}$ by dividing the data into three equal parts: training, validation, and test data, and choosing the parameters corresponding to maximum mean reproducibility over $20$ runs on training and validation set. Figure 3 and Figure 4 show the reproducibility of the components averaged over $20$ runs on training and test data. We can see that the same number of components extracted from the second level using hSCP are, on average, more reproducible than the components extracted using SCP.

III-E Comparison of hSCP with existing approaches

We compared the reproducibility of hierarchical components extracted from hSCP to hierarchical overlapping communities obtained using EAGLE [17] and OSLOM [18]. Implementation of EAGLE and OSLOM was obtained from the authors. Correlation matrix averaged across all the subjects was used as an input to EAGLE and OSLOM. Hierarchical components and communities with $k_{1}\in\{4,5,6,8\}$ , $k_{2}\in\{10,15,20,25\}$ were generated from hSCP, EAGLE and OSLOM. Optimal $\lambda_{1}$ and $\lambda_{2}$ for hSCP were selected by dividing the data into three equal parts: training, validation and test set, and performing the validation procedure as described in section III-D. Reproducibility was computed using training and test for all the methods for all combinations of $k_{1}$ and $k_{2}$ . TableII and TableIII show the reproducibility results on HCP and PNC datasets. For a particular $k_{1}$ and $k_{2}$ , reproducibility table show the average of the two reproducibility values. The results clearly show that the hSCPs have better reproducibility than the communities derived using EAGLE and OSLOM.

III-F Age prediction

We compared the predictability power on the age prediction problem of the hierarchical components extracted from hSCP, EAGLE and OSLOM. Using PNC dataset, we first extracted the components and their strength ( $\mathbf{\Lambda}$ ) for each individual. These strength values were then used to predict age of each individual using linear regression. Pearson correlation coefficient and mean absolute error (MAE) between the predicted brain age and the true age was used as the performance measure for comparison. Table IV summarizes the result obtained. To determine if our results are significantly better, the Wilcoxon signed-rank test was performed as the information about the underlying distribution in case of different performance measures is unknown. As the lower MAE is preferred, we performed a left-tailed hypothesis test when MAE is used as a performance measure. A right-tailed hypothesis test is performed when correlation is used as a performance measure because a higher value of correlation is better. Below is the null hypotheses in the two case:

•

There is no difference between correlation values obtained from our method compared to other methods

•

There is no difference between MAE values obtained from our method compared to other methods

Table V demonstrates that the prediction model built using hSCPs performed significantly better (p-value $<0.05$ ) better than the model built using EAGLE and OSLOM components in the majority of the cases. This indicates that the hSCPs were more informative for predicting brain age. One of the reasons for the poor performance of EAGLE was that it only estimated if a region is present or not present in a component. In contrast, hSCP can determine the strength of the presence, thus had more degree of freedom resulting in better performance.

III-G Clustering

An extension of the above method is presented below, which estimates hSCPs for better clustering of the data and capturing of heterogeneity. Data clustering is performed using subject specific information of the components. We add a penalty term for clustering in the objective function given in problem 4. The modified objective function is given in problem 7. The joint minimization problem for estimating hSCPs and using their subject specific information for clustering is given below:

[TABLE]

where $\mathbf{c}^{l}_{r}$ is the $l^{th}$ cluster center at the $r^{th}$ level, $L$ is the number of clusters, $\mathcal{C}=\{\mathbf{c}^{l}_{r}\}_{l=1:L,r=1:K}$ , $\mathbf{x}^{i}_{r}$ stores the diagonal elements of $\mathbf{\Lambda}^{i}_{r}$ and $\mathcal{M}_{l}$ stores the information whether $i^{th}$ subject belongs to $l^{th}$ cluster or not. In the above problem, $||\frac{{x}^{i}_{d,r}}{\|\mathbf{x}_{d,r}\|}-{c}_{d,r}^{l}||^{2}$ penalty is used for incorporating clustering by penalizing distance between points in a cluster and cluster center, and ${x}^{i}_{d,r}$ is divided by $\|\mathbf{x}_{d,r}\|$ for normalization. The above non-convex problem can be solved in a similar way as the problem 4 is solved in section II-B using alternating minimization. Alg 3 provides a complete procedure for solving the problem. In the Alg 3, k-means( $\mathbf{\Lambda}_{r}^{i}$ ) is used for applying k-means clustering [46] on $\mathbf{\Lambda}_{r}^{i}$ and k-means( $\mathbf{\Lambda}_{r}^{i}$ ) outputs $\mathcal{C}$ as cluster centers and $\{\mathcal{M}_{l}\}_{l=1:L}$ as cluster assignments of $\mathcal{L}$ . We ran the algorithm on HCP data to extract the components and the clusters in the data. Number of clusters was selected by first extracting the hierarchical components without the penalty term and then clustering the data by using k-means on $\mathcal{L}$ . $L$ which is number of clusters was set to 2 by using the elbow method. Number of of coarse scale components was set to be 4 and and fine scale components to be 10 since they exhibited the highest reproducibility between the training and test sets. Figure 5 and Figure 6 show the distribution of fine and coarse components in two clusters. From Figure 6, we can see that component A and C are more prominent in cluster 2 compared to cluster 1, and component B and D are prominent in cluster 1 compared to cluster 2. The algorithm has forced Component A and its sub-components to have higher weights in one cluster. But for component B, sub-components 2 and 3 are prominent in cluster 1 and sub-component 1 is prominent in cluster 2 which can be seen in Figure 5. From the Figure 5 and 6, it can be seen that our method can reveal heterogeneity in the population by capturing the strength of components’ presence in each individual.

III-H Results from resting state fMRI

Figure 7 displays the 10 fine level components, the 4 coarse level components and the hierarchical structure. It can be clearly seen from Figure 7 that the fine and coarse level components are overlapping and sparse, and coarse components are comprised of a sparse linear combination of fine level components which helps in discovering the relation between different networks at different scales.

The ten fine level components obtained show the relation between different functional networks and are similar to the SCPs extracted in [26]. From Fig. 7, it can be seen that our approach can separate task-positive regions and their associated task-negative regions into separate patterns without using traditional seed-based methods that require knowledge of a seed region of interest. Various studies have found that task-positive regions are positively correlated with each other, and task-negative regions are positively correlated with each other. The regions in the two networks are negatively correlated with each other, which aligns with our results [47, 48]. Component 2 covers Default Mode Network and Dorsal Attention Network, which are anti-correlated with each other. This result is a well-known finding, previously described using the seed-based correlation method [22]. Anti correlations between different brain regions can represent interactions that are dependent on the state of the brain. As our method is not capturing dynamics, it has captured the interactions between different regions in different components. An example of this anti-correlation between the default mode network and the task positive network; these interactions are thought to be facilitated by indirect anatomical connections between the regions of two networks[49]. Component 8 shows different regions of DMN anti-correlated with sensorimotor, described in separate study [50]. Component C comprises of three connectivity patterns that involve the sensorimotor areas and its anti-correlations. Component A consists of Visual Network and Ventral Attention Network, which are anti-correlated with each other. From Fig. 7, we can see that part of sensorimotor and emotion networks [51] are anti-correlated with each other. These connections highlighted by our method are corroborated by the fact that these regions have direct anatomical pathways [52]. These negatively correlated networks can highlight different interactions in different brain regions, such as suppression, inhibition, and neurofeedback. An extension of this method that estimates dynamic components can help us understand different anti-correlations mechanism between the regions. Future research is needed to understand more about the anti-correlation and the source of these interactions.

Our study also finds that compared to the primary sensory cortex, the higher-order association cortex has more has more associations in different components, shown in previous studies [53, 54]. Traditional seed-based approaches have been used to show that these regions have functional connectivity with more heterogeneous regions implying that they receive input and send outputs to more diverse brain regions [55, 56]. Thus, allowing overlapping components and positive and negative correlations within the same components provides additional insights. These features of the method facilitate storing the relation of various overlapping regions within a functional system with other areas by assigning them to different components.

Another important observation is that each of the coarse components comprises fine level components having major functional networks and their relation with other nodes. For instance, coarse component B includes majorly of 2 and 3, which stores the link between regions of Default Mode network and other nodes in the brain. Similarly, coarse component D saves the relationship between visual areas and the rest of the brain regions using 4 and 5. Thus, hSCPs can provide novel insights into the functioning of the brain by jointly uncovering both fine and coarse level components with the coarse components comprised of similarly functioning fine components.

IV DISCUSSION AND CONCLUSIONS

In this work, we proposed a novel technique for hierarchical extraction of sparse components from correlation matrices, with application to rsfMRI data . The proposed method is a cascaded joint matrix factorization problem where correlation matrix corresponding to each individual’s data is considered as an independent observation, thus allowing us to model inter-subject variability. We formulated the problem as non-convex optimization with sparsity constraints. It is important to note that as the decomposition is not by itself unique, the ability to reproducibly recover components hinges on imposing sparsity in the decomposition which appears to provide useful and reproducible representations. We used adaptive learning rate based gradient descent algorithm to solve the optimization problem. Compared to the implementation of SCP, which had random initialization, we used SVD initialization which made the complete algorithm both deterministic and faster.

In addition to shared patterns, we are able to extract the ‘strength’ of these patterns in individual components, thus capturing heterogeneity across data. Experimentally we showed that our method is able to find sparse, low rank hierarchical decomposition using cascaded matrix factorization which is highly reproducible across data-sets. Experimental results using the PNC dataset demonstrate that the hierarchical components extracted using our model could better predict the brain age compared to EAGLE and OSLOM. We also show that our model is able to capture heterogeneity using HCP dataset. Our model computationally extracts a set of hierarchical components common across subjects, including resting state networks. At the same time, we capture individual information about subjects as a linear combination of these hierarchical components, making it a useful measure for group studies. Importantly, our work provides a method to uncover hierarchical organization in the functioning of the human brain.

There are several directions for the future work. Firstly, it is possible to extend the idea to estimate dynamic hierarchical components similar to [57] which can help reveal how the hierarchical networks are varying over time. Secondly, generative-discriminative models can be build on the top of hSCP to find the components which are highly discriminative of some particular groups. For example, such model can estimate the hierarchical components which are most discirimanative of a neurodegenerative disorder. Third, it would be interesting to find the guarantee on the estimation error of the hierarchical components. One possible approach is to adapt the proof techniques of [58]. Finally, future studies incorporating cognitive, clinical, and genetic data, might elucidate the biological underpinning and clinical significance of the heterogeneity captured by our approach. Such studies are beyond the scope of the current paper.

Bibliography58

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] O. Sporns, Networks of the Brain . MIT press, 2010.
2[2] G. Doucet, M. Naveau, L. Petit, N. Delcroix, L. Zago, F. Crivello, G. Jobard, N. Tzourio-Mazoyer, B. Mazoyer, E. Mellet et al. , “Brain activity at rest: a multiscale hierarchical functional organization,” Journal of neurophysiology , vol. 105, no. 6, pp. 2753–2763, 2011.
3[3] H.-J. Park and K. Friston, “Structural and functional brain networks: from connections to cognition,” Science , vol. 342, no. 6158, p. 1238411, 2013.
4[4] L. Ferrarini, I. M. Veer, E. Baerends, M.-J. van Tol, R. J. Renken, N. J. van der Wee, D. J. Veltman, A. Aleman, F. G. Zitman, B. W. Penninx et al. , “Hierarchical functional modularity in the resting-state human brain,” Human brain mapping , vol. 30, no. 7, pp. 2220–2231, 2009.
5[5] D. Meunier, R. Lambiotte, A. Fornito, K. Ersche, and E. T. Bullmore, “Hierarchical modularity in human brain functional networks,” Frontiers in neuroinformatics , vol. 3, p. 37, 2009.
6[6] C. F. Beckmann, M. De Luca, J. T. Devlin, and S. M. Smith, “Investigations into resting-state connectivity using independent component analysis,” Philosophical Transactions of the Royal Society B: Biological Sciences , vol. 360, no. 1457, pp. 1001–1013, 2005.
7[7] H. Eavani, R. Filipovych, C. Davatzikos, T. D. Satterthwaite, R. E. Gur, and R. C. Gur, “Sparse dictionary learning of resting state fmri networks,” in 2012 Second International Workshop on Pattern Recognition in Neuro Imaging . IEEE, 2012, pp. 73–76.
8[8] E. Bullmore and O. Sporns, “Complex brain networks: graph theoretical analysis of structural and functional systems,” Nature reviews neuroscience , vol. 10, no. 3, p. 186, 2009.