ASD-DiagNet: A hybrid learning approach for detection of Autism Spectrum   Disorder using fMRI data

Taban Eslami; Vahid Mirjalili; Alvis Fong; Angela Laird; Fahad; Saeed

arXiv:1904.07577·cs.LG·April 17, 2019

ASD-DiagNet: A hybrid learning approach for detection of Autism Spectrum Disorder using fMRI data

Taban Eslami, Vahid Mirjalili, Alvis Fong, Angela Laird, Fahad, Saeed

PDF

1 Repo

TL;DR

ASD-DiagNet is a hybrid machine learning framework that uses fMRI data, autoencoders, and data augmentation to improve autism spectrum disorder detection accuracy and efficiency across diverse datasets.

Contribution

The paper introduces ASD-DiagNet, a novel hybrid learning approach combining autoencoders and data augmentation for more accurate and faster ASD classification from fMRI data.

Findings

01

Achieved up to 20% higher accuracy than existing methods.

02

Reduced training time from 6 hours to 40 minutes.

03

Validated on a large, multi-center dataset with 1035 subjects.

Abstract

Mental disorders such as Autism Spectrum Disorders (ASD) are heterogeneous disorders that are notoriously difficult to diagnose, especially in children. The current psychiatric diagnostic process is based purely on the behavioural observation of symptomology (DSM-5/ICD-10) and may be prone to over-prescribing of drugs due to misdiagnosis. In order to move the field towards more quantitative fashion, we need advanced and scalable machine learning infrastructure that will allow us to identify reliable biomarkers of mental health disorders. In this paper, we propose a framework called ASD-DiagNet for classifying subjects with ASD from healthy subjects by using only fMRI data. We designed and implemented a joint learning procedure using an autoencoder and a single layer perceptron which results in improved quality of extracted features and optimized parameters for the model. Further, we…

Tables5

Table 1. Table I: Class membership information of ABIDE-I dataset for each individual site

Site	Caltech	CMU	KKI	Leuven	MaxMun	NYU	OHSU	OLIN	PITT	SBL	SDSU	Stanford	Trinity	UCLA	UM	USM	Yale
ASD	19	14	20	29	24	75	12	19	29	15	14	19	22	54	66	46	28
Healthy control	18	13	28	34	28	100	14	15	27	15	22	20	25	44	74	25	28

Table 2. Table II: Classification performance using 10-fold cross-validation on the whole dataset; Note that our proposed approach, ASD-DiagNet (with data augmentation) achieves highest accuracy among existing methods.

Method	Accuracy	Sensitivity	Specificity
ASD-DiagNet	70.1	67.8	72.8
ASD-DiagNet (no aug.)	69.2	66.4	73.1
SVM	60.3	35	84.4
Random Forest	63	54.9	71.3
Heinsfeild et al. [11]	65.4	69.3	61

Table 3. Table III: Classification accuracy using 5-fold cross-validation on individual data centers using our proposed method, ASD-DiagNet (with and without data augmentation), compared with other existing methods.

Site	ASD-DiagNet	ASD-DiagNet	Ref. [11]	SVM	Random-
Site	ASD-DiagNet	(no aug.)	Ref. [11]	SVM	Forest
Caltech	51.4	49.2	52.3	48.5	55.4
CMU	63.6	62.5	45.3	60	64.6
KKI	70.6	66.6	58.2	58.2	67.6
Leuven	59	57.2	51.8	53.9	57.5
MaxMun	48.3	48	54.3	53.8	45.8
NYU	68.5	66.1	64.5	57.1	62.3
OHSU	80	65.33	74	54	54.4
Olin	64.7	61.33	44	55.7	53.4
Pitt	68	66.8	59.8	51.8	60.87
SBL	53	52.3	46.6	50	47.6
SDSU	63.9	63	63.6	61.1	61.9
Stanford	62.5	61.5	48.5	51.4	60.1
Trinity	52.9	53.3	61	53.3	52.6
UCLA	72	71.3	57.7	55.1	69.3
USM	69	64	62	64.7	64.7
UM	64.2	64.7	57.6	52.8	63.5
Yale	63.2	61.3	53	57.6	58.2
Average	63.2	60.8	56.1	55.1	59.8

Table 4. Table IV: Running time for 10-fold cross-validation (training and evaluation) on the whole dataset.

Method	Running time
ASD-DiagNet	$41.14$ min
ASD-DiagNet (no aug.)	$20.5$ min
SVM	$3$ min
Random forest	$1$ min
Heinsfeild et al [11]	$6$ hr

Table 5. Table V: Classification accuracy using other parcellations of brain fMRI data: AAL and Dosenbach160; Note that our proposed method, ASD-DiagNet, outperforms existing techniques using both atlases.

Method	AAL	Dosenbach160
ASD-DiagNet	67.8	65
ASD-DiagNet (no augmentation)	65.6	64.3
Heinsfeild et al [11]	65.8	63.8
SVM	59.3	51.7
Random forest	62.6	58.6

Equations16

ρ_{uv} = \frac{\sum _{t = 1}^{T} ( u _{t} - u ˉ ) ( v _{t} - v ˉ )}{\sum _{t = 1}^{T} ( u _{t} - u ˉ ) ^{2} \sum _{t = 1}^{T} ( v _{t} - v ˉ ) ^{2}}

ρ_{uv} = \frac{\sum _{t = 1}^{T} ( u _{t} - u ˉ ) ( v _{t} - v ˉ )}{\sum _{t = 1}^{T} ( u _{t} - u ˉ ) ^{2} \sum _{t = 1}^{T} ( v _{t} - v ˉ ) ^{2}}

h_{e n c} = ϕ_{e n c} (x) = τ (W_{e n c} x + b_{e n c})

h_{e n c} = ϕ_{e n c} (x) = τ (W_{e n c} x + b_{e n c})

x^{'} = ϕ_{d ec} (h_{e n c}) = W_{d ec} h_{e n c} + b_{d ec}

x^{'} = ϕ_{d ec} (h_{e n c}) = W_{d ec} h_{e n c} + b_{d ec}

\begin{array}[]{rl}f(x)&=\sigma\left(W_{slp}h_{enc}+b_{slp}\right)\\ &=\sigma\left(W_{slp}\tau(W_{enc}x+b_{enc})+b_{slp}\right)\end{array}

\begin{array}[]{rl}f(x)&=\sigma\left(W_{slp}h_{enc}+b_{slp}\right)\\ &=\sigma\left(W_{slp}\tau(W_{enc}x+b_{enc})+b_{slp}\right)\end{array}

H (y, f (x)) = - (y \times f (x) + (1 - y) \times (1 - f (x)))

H (y, f (x)) = - (y \times f (x) + (1 - y) \times (1 - f (x)))

\overset{y}{^} = {1, 0, if f (x) \geq 0.5, otherwise .

\overset{y}{^} = {1, 0, if f (x) \geq 0.5, otherwise .

p^{'} = α \times p + (1 - α) \times q_{r}

p^{'} = α \times p + (1 - α) \times q_{r}

\begin{array}[]{rl}EROS(A,B,w)&=\sum_{i=1}^{n}{w_{i}\left|\langle a_{i},b_{i}\rangle\right|}\\ &=\sum_{i=1}^{n}{w_{i}\left|cos\theta_{i}\right|}\end{array}

\begin{array}[]{rl}EROS(A,B,w)&=\sum_{i=1}^{n}{w_{i}\left|\langle a_{i},b_{i}\rangle\right|}\\ &=\sum_{i=1}^{n}{w_{i}\left|cos\theta_{i}\right|}\end{array}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pcdslab/ASD-DiagNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSolana Customer Service Number +1-833-534-1729

Full text

ASD-DiagNet: A hybrid learning approach for detection of Autism Spectrum Disorder using fMRI data

Taban Eslami, Vahid Mirjalili, Alvis Fong, Angela Laird, and Fahad Saeed∗

Taban Eslami, Vahid Mirjalili, Alvis Fong, Angela Laird, and Fahad Saeed∗ T. Eslami and A. Fong are with the Department of Computer Science, Western Mihcigan university, Kalamazoo, MI, 49008. E-mail: taban.eslami,[email protected]. Mirjalili is with Department of Computer Science and Engineering, Michigan State University, Lansing, MI, 48824. E-mail: [email protected]. Laird is with Department of Physics, Florida International University, Miami, FL, 33199. E-mail: [email protected]. Saeed is with school of computing and information science, Florida International University, Miami, FL, 33199. ∗ Corresponding E-mail: [email protected]

Abstract

Mental disorders such as Autism Spectrum Disorders (ASD) are heterogeneous disorders that are notoriously difficult to diagnose, especially in children. The current psychiatric diagnostic process is based purely on the behavioural observation of symptomology (DSM-5/ICD-10) and may be prone to over-prescribing of drugs due to misdiagnosis. In order to move the field towards more quantitative fashion, we need advanced and scalable machine learning infrastructure that will allow us to identify reliable biomarkers of mental health disorders. In this paper, we propose a framework called ASD-DiagNet for classifying subjects with ASD from healthy subjects by using only fMRI data. We designed and implemented a joint learning procedure using an autoencoder and a single layer perceptron which results in improved quality of extracted features and optimized parameters for the model. Further, we designed and implemented a data augmentation strategy, based on linear interpolation on available feature vectors, that allows us to produce synthetic datasets needed for training of machine learning models. The proposed approach is evaluated on a public dataset provided by Autism Brain Imaging Data Exchange including 1035 subjects coming from 17 different brain imaging centers. Our machine learning model outperforms other state of the art methods from 13 imaging centers with increase in classification accuracy up to 20% with maximum accuracy of 80%. The machine learning technique presented in this paper, in addition to yielding better quality, gives enormous advantages in terms of execution time (40 minutes vs. 6 hours on other methods). The implemented code is available as GPL license on GitHub portal of our lab (https://github.com/pcdslab/ASD-DiagNet).

Index Terms:

fMRI, ASD, SLP, Autoencoder, ABIDE, Classification, Data augmentation

I Introduction

Mental disorders such as Autism Spectrum Disorders (ASD) are heterogeneous disorders that are notoriously difficult to diagnose, especially in children. The current psychiatric diagnostic process is based purely on behavioural observation of symptomology (DSM-5/ICD-10) and may be prone to misdiagnosis [1]. There is no quantitative test that can be prescribed to patients that may lead to definite diagnosis of a person. Such quantitative and definitive tests are a regular practice for other diseases such as diabetes, HIV, and hepatitis-C. It is widely known that defining and diagnosing mental health disorders is a difficult process due to overlapping nature of symptoms, and lack of a biological test that can serve as a definite and quantified gold standard [2]. Autism Spectrum Disorders (ASD) is a lifelong neuro-developmental brain disorder which causes social impairments like repetitive behaviour and communication problems in children. More than $1\%$ of children suffer from this disorder and detecting it at early ages can be beneficial. Studies show that some demographic attributes like gender and race vary among ASD and healthy individuals such that males are four times more prone to ASD than females [3].

Quantitative analysis of brain imaging data can provide valuable biomarkers that result in more accurate diagnosis of brain diseases. Machine learning techniques using brain imaging data (e.g. Magnetic Resonance Imaging (MRI) and functional Magnetic Resonance Imaging (fMRI)) have been extensively used by researchers for diagnosing brain disorders like Alzheimer’s, ADHD, MCI and, Autism. [4, 5, 6, 7, 8, 9, 10].

In this paper, we focus on classifying subjects suffering from Autism Spectrum Disorders (ASD) from healthy control subjects using fMRI data. We propose a method called ASD-DiagNet which consists of an autoencoder and a single layer perceptron. These networks are used for extracting lower dimensional features in a hybrid manner and the trained perceptron is used for the final round of classification. In order to enlarge the size of the training set, we designed a data augmentation technique which generates new data in feature space by using available data in the training set. Based on the experimental results, ASD-DiagNet achieved $70.1\%$ classification accuracy which outperforms the current state of the art technique [11]. Further, we show that ASD-DiagNet scales extremely well with increasing size of the data and takes only 41 minutes to run as compared to $6$ hours needed by other methods [11]. Average accuracy on individual sites is $63\%$ , which is $7\%$ better than the result reported by [11]. Our machine learning technique will allow greater quantification of ASD diagnosis and is a step forward to making the early diagnosis and treatment a priority.

The structure of this paper is as follows: In the next section, we explain the state of the art in the field. In Section III, we explain ASD-DiagNet method in detail. In Section IV, we describe the experiment setting and discuss the results of ASD-DiagNet. Finally, in Section V, we conclude the paper and discuss future direction.

II Background Information and Literature Review

Detecting ASD using fMRI data has recently gained a lot of attention, thanks to Autism Brain Imaging Data Exchange (ABIDE) initiative for providing functional and structural brain imaging datasets collected from several brain imaging centers around the world [12]. Many studies and methods have been developed based on ABIDE data [11, 13, 14, 15]. Some studies included a subset of this dataset based on specific demographic information to analyze their proposed method. For example, Iidaka [13] used probabilistic neural network for classifying resting state fMRI (rs-fMRI) data from $312$ ASD and $328$ healthy control subjects (Subjects under $20$ years old were selected) which achieved around $90\%$ accuracy. In another work, Plit et al. [16] used two sets of rs-fMRI data, one containing $118$ male individuals ( $59$ ASD; $59$ TD) and the other containing $178$ age and IQ matched individuals ( $89$ ASD; $89$ TD) from ABIDE dataset and achieved $76.67\%$ accuracy.

Besides using fMRI data, some studies also included structural and demographic information of subjects for diagnosing ASD. Parisot et al. [17] proposed a framework based on Graph Convolutional Networks that achieved $70.4\%$ accuracy. In their work, they represented the population as a graph in which nodes are defined based on imaging features and phenotypic information describe the edge weights. Sen et al. [18] proposed a new algorithm which combines structural and functional features from MRI and fMRI data and got $64.3\%$ accuracy by using $1111$ total healthy and ASD subjects. Nielsen et al. [19] obtained $60\%$ accuracy on a group of $964$ healthy and ASD subjects using the functional connectivity between 7266 regions and demographic information like age, gender, and handedness attributes.

Machine learning techniques such as Support Vector Machines (SVM) and Random Forests are explored in multiple studies [20, 21, 15, 22]. For instance, Chen et al. [14] investigated the effect of different frequency bands for constructing brain functional network, and obtained $79.17\%$ accuracy using SVM technique applied to $112$ ASD and $128$ healthy control subjects.

Recently, using neural networks and deep learning methods such as autoencoders, Deep Neural Network (DNN), Long Short Term Memory (LSTM) and Convolutional Neural Network (CNN) have also become very popular for diagnosing ASD [23, 24, 25, 26, 27, 28]. Brown et al. [25] obtained $68.7\%$ classification accuracy on $1013$ subjects composed of $539$ healthy control and $474$ with ASD, by proposing an element-wise layer for deep neural networks which incorporated the data-driven structural priors.

Most recently, Heinsfeld et al. [11] used a deep learning based approach and achieved $70\%$ accuracy for classifying $1035$ subjects ( $505$ ASD and $530$ controls). They claimed this approach improved the state of the art technique. In their technique, distinct pairwise Pearson’s correlation coefficients were considered as features. Two stacked denoising autoencoders were first pre-trained in order to extract lower dimensional data. After training autoencoders, their weights were applied to a multi-layer perceptron classifier (fine-tuning process) which was used for the final classification. However, they also performed classification for each of the $17$ sites included in ABIDE dataset separately, and the average accuracy is reported as $52\%$ . The low performance on individual sites was justified to be due to the lack of enough training samples for intra-site training.

Generally, most related studies for ASD diagnosis using machine learning techniques have only considered a subset of ABIDE dataset, or they have incorporated other information besides fMRI data in their model. There are few studies such as [11], which only used fMRI data without any assumption on demographic information and analyzed all the $1035$ subjects in ABIDE dataset. To the best of our knowledge [11] is currently state of the art technique for ASD diagnosis on whole ABIDE dataset, which we use as the baseline for evaluating our proposed method.

III Materials and methods

III-A Functional Magnetic Resonance Imaging and ABIDE dataset

Functional Magnetic Resonance Imaging (fMRI) is a brain imaging technique that is used for studying brain activities [29, 30]. In fMRI data, the brain volume is represented by a group of small cubic elements called voxels. A time series is extracted from each voxel by keeping track of its activity over time. Scanning the brain using fMRI technology while the subject is resting is called resting state fMRI (rs-fMRI), which is widely used for analyzing brain disorders. In this study, we used preprocessed ABIDE-I dataset that is provided by the ABIDE initiative. This dataset consists of $1112$ rs-fMRI data including ASD and healthy subjects collected from $17$ different sites. We used fMRI data of the same group of subjects which was used in [11]. This set consists of $505$ subjects with ASD and $530$ healthy control from all the $17$ sites. Table I shows the class membership information for each site.

ABIDE-I provided the average time series extracted from seven sets of regions of interest (ROIs) based on seven different atlases which are preprocessed using four different pipelines. The data used in our experiments is preprocessed using C-PAC pipeline [12] and is parcellated into $200$ functionally homogeneous regions generated using spatially constrained spectral clustering algorithm [31] (CC-200). The preprocessing steps include slice time correction, motion correction, nuisance signal removal, low frequency drifts and voxel intensity normalization. It is worth mentioning that each site used different parameters and protocols for scanning the data. Parameters like repetition time (TR), echo time (TE), number of voxels, number of volumes, openness or closeness of the eyes while scanning are different among sites.

III-B ASD-DiagNet: Feature extraction and classification

Functional connectivity between brain regions is an important concept in fMRI analysis and is shown to contain discriminative patterns for fMRI classification. Among correlation measures, Pearson’s correlation is mostly used for approximating the functional connectivity in fMRI data [32, 33, 34]. It shows the linear relationship between the time series of two different regions. Given two times series, $u$ and $v$ , each of length $T$ , the Pearson’s correlation can be computed using the following equation:

[TABLE]

where $\bar{u}$ and $\bar{v}$ are the mean of times series $u$ and $v$ , respectively. Computing all pairwise correlations results in a correlation matrix $\mathcal{C}_{m\times m}$ where $m$ is the number of time series (or regions). Due to the symmetric property of Pearson’s correlation, we only considered the strictly upper triangle part of the correlation matrix. Since we used CC-200 atlas in which the brain is parcellated into $m=200$ regions, there are $m\times(m-1)/2=19900$ distinct pairwise Pearson’s correlations. In this regard, we selected half of the correlations comprising $1/4$ largest and $1/4$ smallest values and eliminated the rest. To do so, we first compute the average of correlations among all subjects in training set and then pick the indices of the largest positive and negative values from averaged correlation array. We then pick the correlations at those indices from each sample as our feature vector. Keeping half of the correlations and eliminating the rest reduces the size of input features by a factor of 2. There is no limitation of the number of high- and anti-correlations that should be kept. Removing more features results in higher computational efficiency as well as reducing the chance of overfitting, however removing too many features can also cause loosing important patterns.

In order to further reduce the size of features, we used an autoencoder to extract a lower dimensional feature representation. An autoencoder is a type of feed-forward neural network model, which first encodes its input $x$ to a lower dimensional representation,

[TABLE]

where $\tau$ is the hyperbolic tangent activation function ( $Tanh$ ), and $W_{enc}$ and $b_{enc}$ represent the weight matrix and the bias for the encoder. Then, the decoder reconstructs the original input data

[TABLE]

where $W_{dec}$ and $b_{dec}$ are the weight matrix and bias for the decoder. In this work, we have designed an autoencoder with tied weights, which means $W_{dec}=W_{enc}^{\top}$ . An autoencoder can be trained to minimize its reconstruction error, computed as the Mean Squared Error (MSE) between $x$ and its reconstruction, $x^{\prime}$ . The choice of using autoencoder instead of other feature extraction techniques like PCA is its ability to reduce the dimensionality of features in a non-linear way. Structure of an autoencoder is shown in Fig. 1.

The lower dimensional data generated during the encoding process contains useful patterns from the original input data with smaller size, and can be used as new features for classification. For the classification task, we used a single layer perceptron (SLP) which uses the bottleneck layer of the autoencoder, $h_{enc}$ , as input, and computes the probability of a sample belonging to the ASD patient class using a sigmoid activation function, $\sigma$ ,

[TABLE]

where $W_{slp}$ and $b_{slp}$ are the weight matrix and the bias for the SLP network. The SLP network can be trained by minimizing the Binary Cross Entropy loss, $\mathcal{H}$ , using the ground-truth class label, $y$ , and the estimated ASD probability for each sample, $f(x)$ :

[TABLE]

Finally, the predicted class label is determined by thresholding the estimated probability

[TABLE]

Typically, an autoencoder is fully trained such that its reconstruction error is minimized, then, the features from bottleneck layer, $h_{enc}$ , are used as input for training the SLP classifier, separately. In contrast, here, we train the autoencoder and the SLP classifier simultaneously. This can potentially result in obtaining low dimensional features that have two properties

useful for reconstructing the original data, 2. 2.

contain discriminative information for the classification task.

This is accomplished by adding the two loss functions, i.e. MSE loss for reconstruction, and Binary Cross Entropy for the classification task, and training both networks jointly. After the joint training process is completed, we further fine-tune the SLP network for a few additional epochs, while parameters of the autoencoder are frozen.

III-C Data augmentation using linear interpolation

Machine learning and especially deep learning techniques can be advantageous if they are provided with enough training data. Insufficient data causes overfitting and non-generalizability of the model [35]. Large training sets are not always available and collecting new data might be costly like in medical imaging field. In these situations, data augmentation techniques can be used for generating synthetic data using the available training set [36, 37, 38, 39, 40]. The data augmentation technique that we propose in this study is inspired by Synthetic Minority Over-sampling Technique (SMOTE) [41]. SMOTE is an effective model which is used for oversampling the data in minority class of imbalanced datasets. SMOTE generates synthetic data in feature space by using the nearest neighbors of a sample. After k-nearest neighbors of sample $p$ are found ( $\{q_{1},q_{2},...,q_{k}\}$ ), a random neighbor is selected ( $q_{r}$ ) and the synthetic feature vector is computed using the following equation:

[TABLE]

In this equation, $\alpha$ is a random number selected uniformly in the range $[0,1]$ . In our implementation, we chose $\alpha$ randomly within range $[0.5,1]$ , so that the synthesized sample is closer to $p$ . Finding the nearest neighbors of a sample is based on a distance or similarity metric. In our work, the samples have feature vectors of size $9950$ (half of the correlations). One idea for computing nearest neighbors is to use Euclidean distance, however, computing the pairwise Euclidean distances with $9950$ features is not efficient. In order to compute the similarity between samples and finding the nearest neighbors, we used a measure called Extended Frobenius Norm (EROS). This measure computes the similarity between two multivariate time series (MTS) [42]. fMRI data consists of several regions each having a time series so we can consider it as a multivariate time series. Our previous study on ADHD disorder has shown that EROS is an effective similarity measure for fMRI data and using it along with k-Nearest-Neighbor achieves high classification accuracy [5]. This motivated us to utilize it as part of the data augmentation process. EROS computes the similarities between two MTS items $A$ and $B$ based on eigenvalues and eigenvectors of their covariance matrices using the following equation:

[TABLE]

where, $\theta_{i}$ is the cosine of the angle between $i_{th}$ corresponding eigenvectors of covariance matrices of multivariate time series $A$ and $B$ . Furthermore, $w$ is the weight vector which is computed based on eigenvalues of all MTS items using Algorithm 1. This algorithm computes the weight vector $w$ by normalizing eigenvalues of each MTS item followed by applying an aggregate function $f$ (here, we used mean) to all eigenvalues over the entire training dataset and finally normalizing them so that $\sum_{i=1}^{n}w_{i}=1$ .

In order to further reduce the time needed for computing the pairwise similarities, we considered using the first two eigenvectors of each sample. Our experiments showed that this simplification does not affect the results while reducing the running time significantly compared to using all eigenvectors and eigenvalues.

Now, using EROS as the similarity measure, our data augmentation process is shown in Algorithm 2. After finding $k=5$ nearest neighbors of each sample $i$ in the training set, one of them is randomly selected, a new sample is generated using linear interpolation between the selected neighbor and sample $i$ . Using this approach, one synthetic sample is created for each training point which results in doubling the size of the training set. Fig. 2 shows the data augmentation process and Fig. 3 shows the overall process of ASD-DiagNet method.

IV experiments and results

For all the experiments reported in this section, we used a Linux server running Ubuntu Operating System. The server contains two Intel Xeon E5-2620 Processors at $2.40$ GHz with a total $48$ GBs of RAM. The system contains an NVIDIA Tesla K-40c GPU with $2880$ CUDA cores and $12$ GBs of RAM. CUDA version $8$ and PyTorch library were used for conducting the experiments.

We evaluated ASD-DiagNet model in two phases. In the first phase, the model was evaluated using the whole $1035$ subjects from all sites and in the second phase, the model was evaluated for each site separately. As stated earlier, data centers may have used different experimental parameters for scanning fMRI images, so considering all of them in the same pool determines how our model generalizes to data with heterogeneous scanning parameters. On the other hand, by considering each data center separately, fewer subjects are available for training the model and the results indicate how it performs on small datasets. In each of these experiments, the effect of data augmentation was evaluated. The following subsections explain each experiment in more details.

IV-A Phase 1: Experiments using the whole dataset

In this phase, we performed 10-fold cross-validation on the whole $1035$ subjects. Table II compares accuracy, sensitivity, and specificity of our approach with the method proposed by Heinsfeld et al. [11], random forest, and SVM with RBF kernel classifier. SVM and random forest were trained using $19900$ pairwise Pearson’s correlations for each subject. As the results show, ASD-DiagNet achieves $70.1\%$ which outperforms other methods. The proposed data augmentation helps to improve the results by around $1\%$ .111We like to mention that Heinsfeild [11] reported $70\%$ accuracy in their paper, however, the accuracy we reported here is the result of running their method on our system using their default parameters and the code they provided online. The different results observed here could be due to some missing details in the implementation.

IV-B Phase 2: Intra-site evaluation

In this phase, we performed 5-Fold cross-validation on each site, separately. The accuracy of each method is provided in Table III. Based on these results, our method achieves the highest accuracy in most cases and outperforms other methods on average. In addition, note that the proposed data augmentation helps improving the result around $2\%$ overall. Especially, for OHSU, the data augmentation improves the accuracy significantly ( $15\%$ increase).

IV-C Running time

The running time needed for performing 10-fold cross-validation by different approaches is shown in Table IV. The training and evaluation for all methods are performed on the same Linux system (described in Section IV).

Based on the results in Table IV, ASD-DiagNet performs significantly faster than [11]. The data augmentation doubles the size of the training set by generating one artificial sample per subject in the training set. As a result, the data augmentation increases the computation time by a factor of $2$ .

IV-D Experiment on other parcellations

We tested ASD-DiagNet on two other ROI atlases besides CC-200. The first parcellation is based on Automated Anatomical Labeling (AAL) atlas in which the brain is parcellated into 116 regions using AAL toolbox. The other atlas is called Dosenbach160 which parcellates the brain into 160 regions. The data for these parcellations is also provided in ABIDE dataset. Dosenbach160 and AAL contain 12720 and 6670 pairwise correlations, respectively. Similar to CC-200 atlas, half of the correlations (keeping the 1/4 largest and 1/4 smallest values, and removing the rest intermediate values) are selected as input features to the model. The resulting average accuracy, sensitivity, and specificity of performing 10-fold cross-validation on the whole dataset using different approaches for AAL and Dosenbakh160 are shown in Table V.

Based on the results in Table V, our proposed method with and without the augmentation process performs better than existing methods. Note that the classification accuracy obtained using these parcellations are below the accuracy obtained using CC-200 atlas, which implies that the pairwise correlations among CC-200 regions contain more discriminative patterns than AAL and Dosenbakh160 atlases.

V Conclusion and future work

In this paper, we targeted the problem on classifying subjects with ASD disorder from healthy subjects. We used fMRI data provided by ABIDE consortium, which has been collected from different brain imaging centers. No assumption or utilization of any demographic information is considered in this study. Our approach, called ASD-DiagNet, is based on using the most correlated and anti-correlated connections of the brain as feature vectors and using an autoencoder to extract lower dimensional patterns from them. The autoencoder and a single layer perceptron are trained in a joint approach for performing feature selection and classification. We also proposed a data augmentation method in order to increase the number of samples using the available training set. We tested this method by performing 10-fold cross-validation on the whole dataset and achieved $70.1$ % accuracy in $40$ minutes. The running time of our approach is significantly shorter than $6$ hours needed by the state of the art method while achieving higher classification accuracy. In another experiment, we evaluated our method by performing 5-fold cross-validation on each data center, separately. The average result shows significant improvement in accuracy compared to the state of the art method. In this case, data augmentation helps to improve the accuracy by around $2$ %. These results demonstrate that our approach can be used for both intra-site brain imaging data, which are usually small sets generated in research centers, and bigger multi-site datasets like ABIDE in a reasonable amount of time.

Funding

This research was supported by National Institute of General Medical Sciences (NIGMS), NIH Award Number R15GM120820, and National Science Foundations (NSF) under Award Numbers NSF CRII CCF-1464268, NSF CRII CCF- 1855441, NSF CAREER ACI-1651724 and NSF OAC 1925960. The content is solely the responsibility of the authors and does not necessarily represent the official views of governmental agencies.

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. E. Nickel and L. Huang-Storms, “Early identification of young children with autism spectrum disorder,” The Indian Journal of Pediatrics , vol. 84, no. 1, pp. 53–60, 2017.
2[2] “Attention deficit hyperactivity disorder: diagnosis and management of ADHD in children, young people and adults.” National Collaborating Centre for Mental Health (UK), British Psychological Society, 2018.
3[3] J. Baio, L. Wiggins, D. L. Christensen, M. J. Maenner, J. Daniels, Z. Warren, M. Kurzius-Spencer, W. Zahorodny, C. R. Rosenberg, T. White et al. , “Prevalence of autism spectrum disorder among children aged 8 years—autism and developmental disabilities monitoring network, 11 sites, united states, 2014,” MMWR Surveillance Summaries , vol. 67, no. 6, p. 1, 2018.
4[4] E. Hosseini-Asl, G. Gimel’farb, and A. El-Baz, “Alzheimer’s disease diagnostics by a deeply supervised adaptable 3D convolutional network,” ar Xiv preprint ar Xiv:1607.00556 , 2016.
5[5] T. Eslami and F. Saeed, “Similarity based classification of ADHD using singular value decomposition,” in Proceedings of the ACM International Conference on Computing Frontiers 2018 . ACM, 2018, pp. 19–25.
6[6] A. Khazaee, A. Ebrahimzadeh, A. Babajani-Feremi, A. D. N. Initiative et al. , “Classification of patients with MCI and AD from healthy controls using directed graph measures of resting-state f MRI,” Behavioural brain research , vol. 322, pp. 339–350, 2017.
7[7] Z. Yang, S. Zhong, A. Carass, S. H. Ying, and J. L. Prince, “Deep learning for cerebellar ataxia classification and functional score regression,” in International Workshop on Machine Learning in Medical Imaging . Springer, 2014, pp. 68–76.
8[8] X. Peng, P. Lin, T. Zhang, and J. Wang, “Extreme learning machine-based classification of adhd using brain structural mri data,” Plo S one , vol. 8, no. 11, p. e 79476, 2013.