Convolutional Neural Network on Semi-Regular Triangulated Meshes and its   Application to Brain Image Data

Caoqiang Liu; Hui Ji; Anqi Qiu

arXiv:1903.08828·cs.LG·April 16, 2019

Convolutional Neural Network on Semi-Regular Triangulated Meshes and its Application to Brain Image Data

Caoqiang Liu, Hui Ji, Anqi Qiu

PDF

Open Access

TL;DR

This paper introduces a novel convolutional neural network designed for semi-regular triangulated meshes, enabling effective analysis of brain MRI data for disease classification, with improved spatial convolution operations.

Contribution

The paper presents a vertex-based CNN on semi-regular meshes with directly defined convolution and down-sampling, tailored for 3D brain imaging data analysis.

Findings

01

Effective classification of MCI and AD using the proposed CNN.

02

Comparison shows improved performance over spectral graph CNN.

03

Demonstrated applicability on large MRI dataset from ADNI.

Abstract

We developed a convolution neural network (CNN) on semi-regular triangulated meshes whose vertices have 6 neighbours. The key blocks of the proposed CNN, including convolution and down-sampling, are directly defined in a vertex domain. By exploiting the ordering property of semi-regular meshes, the convolution is defined on a vertex domain with strong motivation from the spatial definition of classic convolution. Moreover, the down-sampling of a semi-regular mesh embedded in a 3D Euclidean space can achieve a down-sampling rate of 4, 16, 64, etc. We demonstrated the use of this vertex-based graph CNN for the classification of mild cognitive impairment (MCI) and Alzheimer's disease (AD) based on 3169 MRI scans of the Alzheimer's Disease Neuroimaging Initiative (ADNI). We compared the performance of the vertex-based graph CNN with that of the spectral graph CNN.

Figures3

Click any figure to enlarge with its caption.

Tables4

Table 1. Table 1: Demographic and clinical information of the ADNI-2 cohort based on MRI scans.

	CON	EMCI	LMCI	AD
Number of subjects^∗	400	301	187	261
Number of scans	1122	865	595	587
Female/Male	607/515	395/470	268/327	254/333
Age (Mean $\pm$ SD)	75.3 $\pm$ 6.8	72.6 $\pm$ 7.5	73.6 $\pm$ 8.0	75.3 $\pm$ 7.7

Table 2. Table 2: Demographic and clinical information of the ADNI-1 cohort based on MRI scans.

	CON	MCI	AD
Number of subjects^∗	243	415	355
Number of scans	1067	1515	1016
Female/Male	493/574	525/990	443/573
Age (Mean $\pm$ SD)	76.8 $\pm$ 5.3	75.9 $\pm$ 7.3	76.3 $\pm$ 7.2

Table 3. Table 3: Comparison between vertex-based and graph CNN in terms of accuracy (ACC), sensitivity (SEN), specificity (SPE), and geometric mean (GMean)

Model	Task	ACC(%)	SEN (%)	SPE (%)	GMean(%)
Vertex-based CNN	CON vs. AD	89.0 $\pm$ 0.6	86.4 $\pm$ 1.1	90.3 $\pm$ 1.1	88.4 $\pm$ 0.6
	CON vs. LMCI	73.3 $\pm$ 1.1	67.5 $\pm$ 2.5	76.4 $\pm$ 1.7	71.8 $\pm$ 1.2
	CON vs. EMCI	67.9 $\pm$ 2.6	67.0 $\pm$ 2.6	68.7 $\pm$ 2.5	67.8 $\pm$ 1.0
	EMCI vs. LMCI	55.6 $\pm$ 1.4	51.2 $\pm$ 2.5	58.7 $\pm$ 2.4	54.8 $\pm$ 1.4
	EMCI vs. AD	79.9 $\pm$ 1.4	75.2 $\pm$ 1.7	83.1 $\pm$ 1.7	79.0 $\pm$ 1.4
	LCMI vs. AD	65.4 $\pm$ 1.3	66.5 $\pm$ 2.2	64.4 $\pm$ 2.7	65.4 $\pm$ 1.4
Graph CNN	CON vs. AD	85.8 $\pm$ 0.8	83.5 $\pm$ 3.2	87.5 $\pm$ 2.8	85.4 $\pm$ 0.8
	CON vs. LMCI	69.3 $\pm$ 2.2	65.6 $\pm$ 7.6	72.0 $\pm$ 5.4	68.5 $\pm$ 3.0
	CON vs. EMCI	51.8 $\pm$ 1.2	55.3 $\pm$ 5.1	48.6 $\pm$ 6.4	53.5 $\pm$ 4.2
	EMCI vs. LMCI	60.9 $\pm$ 2.2	52.5 $\pm$ 8.8	67.8 $\pm$ 9.8	59.1 $\pm$ 1.4
	EMCI vs. AD	79.2 $\pm$ 2.6	70.4 $\pm$ 4.7	85.8 $\pm$ 4.7	77.6 $\pm$ 2.7
	LCMI vs. AD	65.2 $\pm$ 1.6	62.6 $\pm$ 5.2	68.0 $\pm$ 6.6	65.3 $\pm$ 1.4

Table 4. Table 4: Classification to the ADNI-1 dataset in terms of accuracy (ACC), sensitivity (SEN), specificity (SPE), and geometric mean (GMean)

Task	ACC(%)	SEN (%)	SPE (%)	GMean(%)
CON vs. AD	88.9	82.3	95.2	88.5
CON vs. MCI	67.7	55.8	84.6	68.7
MCI vs. AD	65.2	80.6	54.9	66.5

Equations26

(f \otimes h) [m] = n \in Z^{2} \sum f [n] h [m - n] = n \in m - Ω \sum f [n] h [m - n]

(f \otimes h) [m] = n \in Z^{2} \sum f [n] h [m - n] = n \in m - Ω \sum f [n] h [m - n]

T = ({x_{i}}, {Σ_{ij k}}), i, j, k \in {1, \dots, N},

T = ({x_{i}}, {Σ_{ij k}}), i, j, k \in {1, \dots, N},

P_{i} = {P [i, 1], P [i, 2], \dots, P [i, 6]} .

P_{i} = {P [i, 1], P [i, 2], \dots, P [i, 6]} .

h = [h [0], h [1], h [2], \dots h [6]]^{⊤} .

h = [h [0], h [1], h [2], \dots h [6]]^{⊤} .

(f \otimes h) [i] = j = 0 \sum 6 f [i, j] h [j],

(f \otimes h) [i] = j = 0 \sum 6 f [i, j] h [j],

D=\left[\begin{array}[]{ccccc}\widetilde{f}[1,0]&\cdots&\widetilde{f}[i,0]&\cdots&\widetilde{f}[N,0]\\ \widetilde{f}[1,1]&\cdots&\widetilde{f}[i,1]&\cdots&\widetilde{f}[N,1]\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ \widetilde{f}[1,6]&\cdots&\widetilde{f}[i,1]&\cdots&\widetilde{f}[N,6]\end{array}\right]

D=\left[\begin{array}[]{ccccc}\widetilde{f}[1,0]&\cdots&\widetilde{f}[i,0]&\cdots&\widetilde{f}[N,0]\\ \widetilde{f}[1,1]&\cdots&\widetilde{f}[i,1]&\cdots&\widetilde{f}[N,1]\\ \vdots&\vdots&\vdots&\vdots&\vdots\\ \widetilde{f}[1,6]&\cdots&\widetilde{f}[i,1]&\cdots&\widetilde{f}[N,6]\end{array}\right]

f \otimes h : f \in R^{N} \to h^{⊤} D \in R^{N} .

f \otimes h : f \in R^{N} \to h^{⊤} D \in R^{N} .

f (x) = max {0, x}, x \in R .

f (x) = max {0, x}, x \in R .

{T^{(0)}, T^{(1)}, T^{(2)}, \dots, T^{(L)}}

{T^{(0)}, T^{(1)}, T^{(2)}, \dots, T^{(L)}}

(v_{0}, w_{2}, w_{1}), (v_{1}, w_{0}, w_{2}), (v_{2}, w_{1}, w_{0}), (w_{0}, w_{1}, w_{2}) .

(v_{0}, w_{2}, w_{1}), (v_{1}, w_{0}, w_{2}), (v_{2}, w_{1}, w_{0}), (w_{0}, w_{1}, w_{2}) .

T^{(0)} \to T^{(1)} \to T^{(2)} \to \dots .

T^{(0)} \to T^{(1)} \to T^{(2)} \to \dots .

\begin{array}[]{ll}\text{Mean pooling}:&f[i]^{(j)}\longrightarrow\frac{1}{7}\sum_{r\in\Omega_{i}^{(j+1)}}f[r]^{(j+1)};\\ \text{Max pooling}:&f[i]^{(j)}\longrightarrow\max_{r\in\Omega_{i}^{(j+1)}}f[r]^{(j+1)},\end{array}

\begin{array}[]{ll}\text{Mean pooling}:&f[i]^{(j)}\longrightarrow\frac{1}{7}\sum_{r\in\Omega_{i}^{(j+1)}}f[r]^{(j+1)};\\ \text{Max pooling}:&f[i]^{(j)}\longrightarrow\max_{r\in\Omega_{i}^{(j+1)}}f[r]^{(j+1)},\end{array}

M^{(ℓ)} : input \to Convolutions \to ReLU \to Mean Pooling \to Output

M^{(ℓ)} : input \to Convolutions \to ReLU \to Mean Pooling \to Output

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Dementia and Cognitive Impairment Research · Medical Image Segmentation Techniques

MethodsConvolution

Full text

11institutetext: 1 National University of Singapore, Singapore

Convolutional Neural Network on Semi-Regular Triangulated Meshes and its Application to Brain Image Data

Chaoqiang Liu1

Hui Ji1

Anqi Qiu1

Abstract

We developed a convolution neural network (CNN) on semi-regular triangulated meshes whose vertices have 6 neighbours. The key blocks of the proposed CNN, including convolution and down-sampling, are directly defined in a vertex domain. By exploiting the ordering property of semi-regular meshes, the convolution is defined on a vertex domain with strong motivation from the spatial definition of classic convolution. Moreover, the down-sampling of a semi-regular mesh embedded in a 3D Euclidean space can achieve a down-sampling rate of 4, 16, 64, etc. We demonstrated the use of this vertex-based graph CNN for the classification of mild cognitive impairment (MCI) and Alzheimer’s disease (AD) based on 6767 MRI scans of the Alzheimer’s Disease Neuroimaging Initiative (ADNI). We compared the performance of the vertex-based graph CNN with that of the spectral graph CNN.

1 Introduction

Machine learning has been widely used as one crucial technique for medical image segmentation, registration, disease prediction and classification, in which image data are sampled on a Euclidean equi-spaced grid. However, the geometry of human organs is in general very complex, which characterizes the intrinsic properties of anatomy and physiological functions of the organs. For instance, myocardial contraction flows along the wall of the heart. Cortical thickness, representing the depth of the cortical ribbon, is related to the nature of the convoluted gyri and sulci of the cortex. Cortical thickness is thicker on cortical gyri but thinner on cortical sulci. Hence, it is preferred to represent brain image data in terms of its geometry that can be expressed as meshes embedded in the Euclidean space of the brain image. It has been demonstrated that geometric structure relating image data did introduce useful information in machine learning based methods for disease diagnosis (e.g., [26, 9, 1, 20]).

In recent years, deep learning has been one of prevalent machine learning techniques to tackle a wide range of image-related applications. Among many architectures of deep neural networks, convolutional neural network (CNN) received a great attention for its successes in computer vision (e.g. [23, 21, 25, 24, 14, 11]), medical imaging and diagonosis (e.g., [22, 18, 8, 15]). The main blocks to build a CNN include convolution with localized filters, non-linear activation function, and pooling. In CNN, these three blocks are sequentially concatenated to model highly non-linear intrinsic patterns of training data and output the features for targeting applications. Most of these CNNs are developed for modeling image data defined on equi-spaced regular grids. The generalization of such CNNs to image data defined on the meshes embedded in a higher dimensional Euclidean space is non-trivial, especially for the localized convolution and pooling operations.

1.1 CNN on general graphs

One might view meshes as a special class of graphs. There have been several works on generalizing the CNN for modeling data on general graphs; (e.g., [2, 3, 12]). Based on spectral graph theory, Henaff et al. [12] proposed a CNN for graph-structured data, in which convolution is defined as a diagonal multiplicative operation in graph Fourier transform derived from a normalized graph Laplacian. The localization of the convolution is imposed by regularizing those diagonal entries with a smoothness prior. To avoid the computation of a graph Laplacian and have a convolution with better localization, Defferrard et al. [3] introduced Chebyshev polynomial approximation such that the resulting convolution operator is a polynomial of the adjacency matrix of a graph. Kipf and Welling [13] further simplified the approximation using the linear polynomial of the adjacency matrix of a graph and applied the CNN for semi-supervised learning.

Nevertheless, the convolution built on the polynomials of the adjacency matrix is very different from classic convolution on equi-spaced grids. Consider such a generalized convolution derived from the polynomial degree $k$ . Then, it is parameterized by totally $(k+1)$ parameters. The support of such a convolution, i.e., the number of the vertices it covers for each shift, is $3k^{2}+3k+1$ vertices for a regular triangular mesh. The idea of classic CNN for modeling images on equi-spaced grids is that it uses small filters to collect as much local information as possible, and then gradually increase the filter width and down-size the features to represent more global and high-level information. A convolution that covers a large number of vertices might lose important local features which are helpful for modeling.

As mentioned above, how to down-size the feature is also an essential operation for CNN to abstract more high-level information. Such a down-sizing operation happens in both pooling and convolution with stride $>1$ . The graph coarsening procedure used for the pooling in [3] is implemented by calling a weighted graph cut method [4]. From the coarsest to the finest level, fake vertices, i.e. disconnected vertices, are added to pair with the singletons such that each vertex has two children. The fake vertices artificially increase the dimensionality and thus the computation cost even though the number of singletons from multilevel clustering algorithms may not be large.

1.2 CNN on semi-regular triangular meshes

In this study, we proposed a vertex-based CNN approach for modeling image data defined on semi-regular triangular meshes that are well structured in terms of connectivities, e.g., the connectivity of most vertices is $6$ . A semi-regular triangular mesh has certain similarities to equi-spaced grids in the Euclidean space. When image data are defined on a semi-regular triangular mesh, a direct call of a generic CNN for a graph certainly is sub-optimal, as it discards specific connectivity properties of the mesh. Indeed, the connectivity property of a triangular mesh enables us to better mimic convolution and down-sizing operation such as to avoid the issues encountered in the CNN defined on a graph discussed in the previous section.

The key blocks of the CNN proposed in this paper, especially convolution and down-sizing, are directly defined in a vertex domain. By exploiting the ordering property of semi-regular meshes, the convolution is defined on a vertex domain with strong motivation from the spatial definition of classic convolution. Moreover, the down-sampling of a semi-regular mesh was efficient. The down-sampling of a semi-regular mesh embedded in a 3D Euclidean space can achieve a down-sampling rate of 4, 16, 64, etc. We demonstrated the use of this vertex-based graph CNN for the classification of mild cognitive impairment (MCI) and Alzheimer’s disease (AD) based on 3169 image datasets of the Alzheimer’s Disease Neuroimaging Initiative (ADNI). We compared the performance of the vertex-based graph CNN with that of the spectral graph CNN [3].

2 Methods

2.1 Convolution in the vertex domain

Consider a signal $f$ defined on an equi-spaced grid $\{k\}_{k\in{\mathbb{Z}}^{2}}$ , and a finite filter $h$ supported on a finite set ${\mathbf{\Omega}}\subset{\mathbb{Z}}^{2}$ . The convolution is then defined by

[TABLE]

It can be seen that at the vertex $m$ , the value of $f\otimes h$ is indeed weighted average of $f$ over the neighbors of the vertex $m_{0}$ , whose weights are given by $h$ and neighbors are determined by ${\mathbf{\Omega}}$ . In the following, we generalize such a concept to a semi-regular triangular mesh whose vertex in general has $6$ neighbors.

We begin with a semi-regular triangular mesh

[TABLE]

where $\{x_{i}\}$ denotes the set of vertex coordinates. $\{\Sigma_{ijk}\}$ is the set of simplices and $N$ represents the total number of vertices on the mesh. Each simplex $\Sigma_{ijk}$ is a three tuple of points $(i,j,k),i,j,k\in\{1,\dots,N\}$ that specifies the vertices forming a triangular face, i.e., all three vertices $x_{i},x_{j}$ and $x_{k}$ are one-ring neighbours of each other. If $T$ is a semi-regular mesh, then for each vertex, $x_{i}$ , the set of its neighboring vertices can be denoted as ${\mathbf{P}}_{i}\subset\{x_{i}\}$ :

[TABLE]

The ordering of these vertices for this convolution is not straightforward. Fig. 1 illustrates how these vertices are ordered in this study. We first define a sphere, $\mathcal{S}_{i}$ (blue in Fig. 1), that passes $x_{i}$ and approximates the mesh formed by ${\mathbf{P}}_{i}$ . The tagent plane (orange in Fig. 1) of $x_{i}$ on the mesh, $T$ , is defined as the tagent plane of $x_{i}$ on the sphere, $\mathcal{S}_{i}$ . The $xyz-$ coordinate of the tagent plane of $x_{i}$ (red in Fig. 1) is the translation and rotation of the coordinate of $\mathcal{S}_{i}$ (black in Fig. 1). We then order these six vertices in a closewise sequence, where $P[i,1]$ is defined as the vertex whose projection is the closest to the x-axis of the tagent plane of the vertex, $x_{i}$ .

Consider a $1$ -ring filter $h\in{\mathbb{R}}^{7}$ :

[TABLE]

Then, at the vertex $x_{i}$ , the value of a signal $f$ defined on $T$ convolved by $h$ is defined by

[TABLE]

where $\widetilde{f}[i,j]$ denote the value of $f$ at the vertex $P[i,j]$ and $P[i,0]=x_{i}$ .

In a matrix form, we define a matrix $D\in{\mathbb{R}}^{7,N}$ as

[TABLE]

Then, the convolution defined in Eq. (2) can be expressed in the form of a matrix multiplication:

[TABLE]

For the vertex in $T$ with valence $<6$ , whose corresponding column has non-defined entries. Analogous to classic convolution for finite signals, we can define the values of these entries using boundary extension. For example, assigning 0 to these entries which is the same as zero padding boundary extension in classic convolution.

By the same procedure, we can define $2$ -ring convolution and more. Consider a $k$ -ring convolution, it is parameterized by totally $3k(k+1)+1$ parameters, and its support also covers the same number of vertices. This is consistent with the behavior of classic convolution on equi-space grids. Such localization property enables CNN to extract very local features of the data on semi-regular triangular meshes, the same as what CNN is doing on equip-spaced grids. Such ring-type convolution also has been exploited in wavelet transform for surface processing [5].

2.2 Rectified Linear Unit and Pooling

For CNN, there are many types of non-linear activation function. The activation function is a map from ${\mathbb{R}}$ to ${\mathbb{R}}$ , which does not involve any geometrical property of the underlying structure. In our proposed CNN for image data on a semi-regular mesh, we adopt the well-known rectified linear unit (ReLU):

[TABLE]

In addition to convolution and ReLU, another important block is pooling, which can be viewed as a non-linear or linear down-sampling operation. The pooling enables us to reduce the size of representation and thus to reduce the number of parameters, which helps memory usage, computational efficiency and over-fitting controlling. The pooling is done by either taking the maximum or taking the average of the neighbors of those vertices lying on the coarser grid/mesh. The key for defining a pooling operation on a mesh is about how to define a hierarchical triangular mesh:

[TABLE]

such that the vertices of $T^{(j+1)}$ contain all vertices of $T^{(j)}$ and new vertices, and $T^{(L)}$ is the original mesh $T$ on which image data is defined.

As the design of the proposed CNN mainly aims at modeling brain image data, we first generate a hierarchical semi-regular triangular mesh such that the mesh in the finest scale is the mesh extracted from image data. There are many approaches for generating a hierarchical triangular mesh and we adopt the one used [17] which recursively uses subdivision scheme to generate new vertices. Consider a triangle in the mesh $T^{(j)}$ with 3 vertices $(v_{0},v_{1},v_{2})$ . Then, the triangle is subdivided into $4$ smaller ones by $(w_{0},w_{1},w_{1})$ , which are the midpoints of three edges of this triangle. The four new triangles are given by

[TABLE]

See Fig. 2 for an illustration.

Starting with an initial mesh at the coarsest level, recursively applying the subdivision scheme above leads to a hierarchical semi-regular triangular mesh.

[TABLE]

The vertex number of the mesh at each level is $4$ times that of the mesh at the next coarse level. It can be seen that for any vertex $x_{i}^{(j)}$ at the $j$ -th level mesh $T^{(j)}$ , it remains in the $(j+1)$ -th level mesh $T^{(j+1)}$ , and all its $1$ -ring neighbors are $6$ new vertices not in $T^{(j)}$ . Then, for vertex $x_{i}^{(j)}$ at the $j$ -th level mesh $T^{(j)}$ , let $\Omega_{i}^{(j+1)}$ denote the set of this vertex and all its $1$ -ring neighbors. Then, the pooling operator with stride $2$ is defined as

[TABLE]

where $f[i]^{(j)}$ denotes the value at the vertex $x_{i}$ in the $j$ -th level mesh $T^{(j)}$ . Similarly, we define the pooling operator with stride $2$ (and more) by running the same procedure on the vertices and all its $1$ -ring and $2$ -ring neighbors (and more) in the next finer level mesh.

2.3 CNN on a semi-regular mesh

Based on main ingredients presented in the previous section, we propose a vertex-based CNN for analyzing image data defined on a semi-regular mesh, which is analogous to classic CNN for image data defined on equi-spaced grids. The CNN is composed of totally $L+1$ connected blocks $M^{(1)},\ldots,M^{(L)},M^{(L+1)}$ . The first $L$ blocks are the blocks for feature extraction. Each block contains three sequentially concatenated layers: (1) a convolution layer with multiple $1$ -ring convolutions; (2) a ReLU layer; (3) a pooling layer with stride 2 that uses mean pooling:

[TABLE]

The last block $M^{(L+1)}$ is the classification layer using the features extracted from the previous blocks,

In our implementation of the proposed CNN for classifying brain image data. It has 4 feature extraction blocks and 1 classification block. The numbers of the convolution filters in these blocks are [8,16,32,64] respectively. The classification layer is implemented using a fully connected layer with $512$ nodes and with a softmax output. Fig. 3 shows the architecture of the proposed CNN. The $1$ -ring convolution operation can be implemented in a matrix multiplication. The other layers can be implemented using the standard procedure. In our implementation, the CNN is trained using the Adam optimization algorithm. We implemented this CNN on a semi-regular triangulated mesh in Tensorflow. The code is available at the website 111 http://bioeng.nus.edu.sg/cfa/download/GCNN.zip

3 Results

3.1 MRI Data and Analysis

ADNI cohort. This study employed the data from the ADNI-1 and ADNI-2 cohort. The ADNI-2 cohort only included 1149 subjects (400 cognitive normal (CON), 301 early MCI (EMCI), 187 late MCI (LMCI) and 261 AD). The ADNI-1 cohort only included 1013/3598 subjects/scans (243/1067 CON, 415/1515 MCI, and 355/1016 AD). The number of visits of each subject varied from 1 to 7 (i.e., baseline, 3-, 6-, 12-, 24-, 36-, and 48-month). At each visit, subjects were diagnosed as one of the four clinical statuses based on the criteria described in the ADNI-2 protocol (http://adni.loni.usc.edu). The general diagnostic criteria for early and late MCI were the same except LMCI subjects had a lower cut-off point for logical memory II subscale from Wechsler Memory Scale. The demographic and clinical information of subjects from ADNI-2 and ADNI-2 are provided in Table 1 and Table 2.

MRI Data Analysis. All T1-weighted images were segmented using FreeSurfer [10]. The processed images were quality checked based on the criteria listed on the website 222 https://surfer.nmr.mgh.harvard.edu/fswiki/FsTutorial/TroubleshootingData. We represented cortical thickness on the cortical surface generated by FreeSurfer. We employed a large deformation diffeomorphic metric mapping (LDDMM) algorithm [28, 6] to align individual cortical surfaces to the atlas and transferred the thickness of each individual subject to the atlas.

As each subject may have multiple MRI scans, one at each visit, this study included all available T1-weighted images with good quality after processing. We used the clinical status at the MRI acquisition as the classification ground truth. For instance, a subject with multiple scans may have different clinical labels if he/she was from one clinical status to another over time. From 3365 scans from ADNI-2, we discarded 196 scans that missed demographic information or diagnosis labels of CON, EMCI, LMCI and AD, resulting in 3169 scans used in the following CNN analysis. From 3783 scans from ADNI-1, we discarded 185 scans that missed demographic information or diagnosis labels of CON, MCI, and AD, resulting in 3598 scans used below.

3.2 Comparison with the Graph CNN

In this experiment, we compared the computational cost and classification accuracy between the proposed vertex-based CNN and graph CNN [3] based on the ADNI-2 data. The graph CNN [3] incorporated 3 CNN layers with the number of filters of [8,16,32] respectively and a final fully connected layer with $128$ nodes. The convolution in the graph CNN was approximated using Chebyshev polynomial with the order of 3. The network parameters were trained with a mini-batch size of 64, an initial learning rate of $1e^{-3}$ , a weight decay of 0.05, and a momentum of 0.9. During the training process, a $l_{2}-$ norm regularization function of $5e^{-4}$ was applied on the weights of the final fully connected layer to prevent overfitting to the training data. This study employed the 10-fold cross-validation, where the scans from the same subject were assigned to the validation (or testing) to avoid the data leakage issue in the predictive model. We determined the parameters, such as the number of layers and the number of filters, and a learning rate, based on geometric mean (GMean= $\sqrt{SEN\times SPE}$ , where $SEN$ and $SPE$ respectively represent sensitivity (SEN) and specificity (SPE). We chose this measure because it not only maximized the accuracy on each of the two classes but also minimized the difference between the sensitivity and specificity, i.e., the balanced performance for both the positive and negative classes.

We performed the same procedure as mentioned above for six classifiers, including CON vs. AD, CON vs. LMCI, CON vs. EMCI, EMCI vs. LMCI, EMCI vs. AD, and LMCI vs. AD. The experiments were run on Telsa M40 GPU (24GB memory). The computational time for each epoch of our vertex-based CNN and graph CNN [3] was respectively 40 sec and 113 sec. Our proposed approach was $2.83$ times faster than the graph CNN. Table 3 lists the classification accuracy, sensitivity, specificity, and GMean for each classifier. Our proposed vertex-based CNN was better performed than the graph CNN in most of the classifiers, including CON vs. AD, CON vs. LMCI, CON vs. EMCI, EMCI vs. AD, and LMCI vs. AD, except the EMCI vs. LMCI classifier. In addition, our vertex-based approach provided a relatively lower variability across all the four evaluation measures. These findings suggested that the proposed CNN is a fast computational model and has the potential to improve classification accuracy compared to the graph CNN [3].

3.3 Application to ADNI-1

In this study, we applied the CON vs LMCI, CON vs AD, and LMCI vs AD classifiers obtained from the ADNI-2 cohort to the ADNI-1 cohort. Table 4 lists the classification accuracy for MCI and AD, which is comparable to those listed in Table 3. This suggests the robustness of the classifiers built based on the ADNI-2 cohort to the other dataset.

4 Discussion

This paper presented a vertex-based CNN on meshes, in particular, on semi-regular triangulated meshes. We showed that the convolution operation on semi-regular triangulated meshes has the property of translation, similar to that on the Euclidean space. The pooling operation on semi-regular triangulated meshes is analogous to that in the classic CNN in the Euclidean space. We employed this approach to the ADNI-2 data and compared its performance to that of the graph CNN [3]. Our results showed that our vertex-based CNN algorithm was faster than the graph CNN. This is partly because the mesh coarsening procedure used for the pooling in the graph CNN [3] requires adding fake vertices with the singletons such that each vertex has two children. This procedure increases the data dimension that is needed for CNN. In contrast, the pooling operation in our proposed vertex-based CNN is with stride 2, similar to the downsampling factor achieved in the Euclidean space. Moreover, compared to the graph CNN, our proposed vertex-based CNN improved the classification accuracies of the five classifiers, except the EMCI and LMCI classifier. One of the potential limitations of our proposed approach is that it requires meshes to be semi-regular. In general, the construction of semi-regular meshes for medical image data is not an issue. However, our approach does not apply to graph data, such as social networks and citation networks and so on.

In the past decade, substantial studies reported the classification among CON, MCI, and AD based on the ADNI dataset (e.g., [16, 19, 7, 29, 27]. Some of them were based on multi-modal brain images and reported the classification accuracy better than that in Table 3 (e.g., [19, 7, 29, 27]). But the sample size was relatively small hence it is unclear on the robustness of the classification results. Nevertheless, our approach can be easily extended to multi-channel vertex-based CNN for handling multiple-modal or multiple structural data, such as diffusion properties of the cortex, cortical surface area and hippocampal shape. Compared to the existing studies based on cortical thickness ( $~{}85\%$ ) [16], our approach reported the highest classification accuracy. To our best knowledge, our experiment employed the largest image data available in the ADNI-2 cohort, suggesting its potential robustness to other AD datasets.

5 Ackowledgements

We like to thank the National Supercomputing Centre Singapore for providing the computing resource for this study. The study was supported by Institute of Data Science at the National University of Singapore.

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Apostolova, L.G., Dinov, I.D., Dutton, R.A., Hayashi, K.M., Toga, A.W., Cummings, J.L., Thompson, P.M.: 3D comparison of hippocampal atrophy in amnestic mild cognitive impairment and alzheimer’s disease. Brain 129, 2867–2873 (2006)
2[2] Bruna, J., Zaremba, W., Szlam, A., Le Cun, Y.: Spectral networks and locally connected networks on graphs. ar Xiv preprint ar Xiv:1312.6203 (2013)
3[3] Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. pp. 3844–3852. NIPS (2016)
4[4] Dhillon, I.S., Guan, Y., Kulis, B.: Weighted graph cuts without eigenvectors a multilevel approach. IEEE transactions on pattern analysis and machine intelligence 29(11), 1944–1957 (2007)
5[5] Dong, B., Jiang, Q., Liu, C., Shen, Z.: Multiscale representation of surfaces by tight wavelet frames with applications to denoising. Applied and Computational Harmonic Analysis 41(2), 561–589 (2016)
6[6] Du, J., Younes, L., Qiu, A.: Whole brain diffeomorphic metric mapping via integration of sulcal and gyral curves, cortical surfaces, and images. Neuro Image 56(1), 162 – 173 (2011)
7[7] Dyrba, M., Barkhof, F., Fellgiebel, A., Filippi, M., Hausner, L., Hauenstein, K., et al.: Predicting prodromal alzheimer’s disease in subjects with mild cognitive impairment using machine learning classification of multimodal multicenter diffusion-tensor and magnetic resonance imaging data. Journal of Neuroimaging 25, 738–747 (2015)
8[8] Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115 (2017)