Ensemble of 3D CNN regressors with data fusion for fluid intelligence prediction
Marina Pominova, Anna Kuzina, Ekaterina Kondrateva, Svetlana, Sushchinskaya, Maxim Sharaev, Evgeny Burnaev, and Vyacheslav Yarkin

TL;DR
This paper develops an ensemble of 3D CNN regressors with data fusion to predict children's fluid intelligence scores from MRI images, achieving a low mean squared error on unseen data.
Contribution
It introduces an advanced VoxCNN ensemble architecture that effectively combines features and deep learning for brain-based intelligence prediction.
Findings
Achieved an MSE of 92.838 on blind test data.
Demonstrated the effectiveness of ensemble deep learning models.
Validated approach on a large, long-term brain development dataset.
Abstract
In this work, we aim at predicting children's fluid intelligence scores based on structural T1-weighted MR images from the largest long-term study of brain development and child health. The target variable was regressed on a data collection site, socio-demographic variables and brain volume, thus being independent to the potentially informative factors, which are not directly related to the brain functioning. We investigate both feature extraction and deep learning approaches as well as different deep CNN architectures and their ensembles. We propose an advanced architecture of VoxCNNs ensemble, which yield MSE (92.838) on blind test.
| # | Model architecture | MSE |
|---|---|---|
| 1 | Brain morphometry | 71.293 |
| 2 | VoxCNN on brain T1 imagery | 71.777 |
| 3 | VoxCNN on 3D segmented brain mask | 72.094 |
| 4 | Ensemble: VoxCNNs on T1 and segmented mask | 71.314 |
| 5 | Ensemble: VoxCNNs on T1, segmented mask with morphology features | 70.635 |
| # | Model architecture | MSE |
|---|---|---|
| 1 | Ensemble: VoxCNNs on T1 and segmented mask | 92.8378 |
| 2 | Ensemble: VoxCNNs on T1, segmented mask with morphology features | 94.0808 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: Skolkovo Institute of Science and Technology, Moscow, Russia 11email: [email protected]
Ensemble of 3D CNN regressors with data fusion for fluid intelligence prediction
Marina Pominova 11
Anna Kuzina 11
Ekaterina Kondrateva 11
Svetlana Sushchinskaya 11
Maxim Sharaev 11
Evgeny Burnaev 11
Vyacheslav Yarkin 11
Abstract
In this work, we aim at predicting children’s fluid intelligence scores based on structural T1-weighted MR images from the largest long-term study of brain development and child health. The target variable was regressed on a data collection site, sociodemographic variables and brain volume, thus being independent to the potentially informative factors, which are not directly related to the brain functioning. We investigate both feature extraction and deep learning approaches as well as different deep CNN architectures and their ensembles. We propose an advanced architecture of VoxCNNs ensemble, which yield MSE (92.838) on blind test.
VoxCNN ensemble
Keywords:
MRI analysis fluid intelligence prediction Deep learning 3D convolutional neural networks
1 Introduction
Understanding cognitive development in children may potentially improve their health outcomes through adolescence. Thus, determining neural mechanism underlying general intelligence is a critical task. One of two discrete factors of general intelligence is fluid intelligence.
Fluid intelligence is the capacity to think logically and solve problems in novel situations, independent of acquired knowledge. It involves the ability to identify patterns and relationships that underpin novel problems and to extrapolate these findings using logic [Car93].
There are research devoted on fluid intelligence prediction based on different brain imaging techniques and extracted features [ZLL18],[PLN*+*16]. However, the authors could not highlight robust biomarkers and methods to predict fluid intelligence scores .
Deep learning approaches and convolutional neural networks, in particular, have shown high potential on imagery classification, recognition and processing and thus could be considered useful for fluid intelligence scores prediction based on MRI data (3D brain images).
The advantage of deep learning methods is the ability to automatically derive complex and informative features from the raw data during the training process. That allows training a neural network directly on high-dimensional 3D brain imaging data skipping the feature extraction step.
By design, neural architectures for deep learning are built in a modular way, with basic building blocks, such as composite convolutional layers, typically reused across many models and applications. This enables the standardization of deep learning architectures, with much research devoted to the exploration of pre-built layers and pre-trained activations (for transfer learning, image retrieval, etc.). However, the choice of appropriate architecture targeting specific clinical applications such as cognitive potential prediction or pathology classification remains open problem and requires further investigation.
In the present study we carry out an extensive experimental evaluation of deep voxelwise neural network architectures for fluid intelligence scores prediction based on MRI data with multimodal input structure.
The article has the following structure. In Section 2 we overview deep network architectures used for MRI data processing. In Section 3 we present the training dataset and our deep network architecture. We describe obtained results in Section 4, provide discussions in Section 5 and draw conclusions in Section 6.
2 Related work
There is a number of successful applications of convolutional neural networks (CNN) with different architectures for segmentation of MRI data. Many of these solutions are based on adapting existing approaches to analyzing 2D images for processing of three-dimensional data.
For example, for segmentation of the brain, an architecture similar to ResNet [HZRS16] was proposed, which expands the possibilities of deep residual learning for processing volumetric MRI data using 3D filters in convolutional layers. The model, called VoxResNet [CDY*+*18], consists of volumetric residual blocks (VoxRes blocks), containing convolutional layers as well as several deconvolutional layers. The authors demonstrated the potential of ResNet-like volumetric architectures, achieving better results than many modern methods of MRI image segmentation [MNA16]. Convolutional neural networks also showed good classification results in problems associated with neuropsychiatric diseases such as Alzheimer’s disease.
Recently proposed classification model with a VGG-like architecture called VoxCNN was used for neuro-degenerative decease classification [HAGEB16]. These results were more accurate or comparable to earlier approaches that use previously extracted morphometrical lower dimensional brain characteristics [SAA*+*18, SAK*+*18, ISA*+*18].
Thus, this indicates that convolutional networks can be applied directly to the raw neuroimaging data without loss of model performance and over-fitting, which allows skipping the pre-processing step.
However, to the depth of our knowledge, there has not been much work on the use of convolutional networks for predicting fluid intelligence based on MRI imaging.
3 Materials and Methods
3.1 Data set
The training data set is provided by ABCD Neurocognitive Prediction Challenge (ABCD-NP-Challenge 2019111https://sibis.sri.com/abcd-np-challenge/). The data contained of T1-weighed MRI images for four thousand individuals (of age 9-10 years) and corresponding sociodemographic variables [HHM*+*18]. The participants’ fluid intelligence scores (4154 subjects, 3739 for training and 415 for validation) are also provided.
3.2 Target processing
The fluid intelligence scores were pre-residualized on a data collection site, sociodemographic variables and brain volume. For that a linear regression model was fitted with fluid intelligence as the dependent variable and brain volume, data collection site, age at baseline, sex at birth, race/ethnicity, highest parental education, parental income, and parental marital status as independent variables [HHM*+*18].
The obtained residuals are used as targets to be predicted by a regression model.
3.3 MRI data processing
Imagery dataset consists of skull stripped images affinely aligned to the SRI 24 atlas [RZSP10], segmented into regions of interest according to the atlas, and the corresponding volume scores of each ROI [PKB*+*17]. T1-weighted MRI was transformed according to the Minimal Processing Pipeline by ABCD [HHM*+*18].
The cross-sectional component of the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA) pipeline [BBT*+*15] was applied to T1 images. The steps included noise removal and field inhomogeneity correction confined to the brain mask, defined by non-rigidly aligning SRI24 atlas to the T1w MRI via Advanced normalization tools (ANTS) [ATS09].
The brain mask was refined by majority voting across maps extracted by FSL BET [Smi02], AFNI 3dSkullStrip [Cox96], FreeSurfer mrigcut [SZCZ10], and the Robust Brain Extraction (ROBEX) methods [ILTT11], which were applied on combinations of bias and non-bias corrected T1w images. Using the refined masked, image inhomogeneity correction was repeated and the skull-stripped T1w image was segmented into brain tissue (gray matter, white matter, and cerebrospinal fluid) via Atropos [ATW*+*11]. Gray matter tissue was further parcelled according to the SRI24 atlas, which was non-rigidly registered to the T1w image via ANTS.
3.4 Specifications of the investigated models
We use an ensemble of deep neural networks with VoxCNN architecture [KSBD17, PAS*+*18] to solve the regression problem. The proposed architecture has already demonstrated some successful applications to brain image analysis tasks. To provide better convergence and stronger regularization of results we enhanced this architecture.
VoxCNN networks are similar to VGG [SZ14] architecture, which is a popular architecture for 2D-images classification. VoxCNN applies 3D convolutions to deal with three-dimensional MRI brain scans.
Proposed network consists of four blocks with two convolutional layers each having 3D convolutions followed by batch-normalization and ReLU activation function [ESH19]. Number of filters in convolutional layers starts from 16 in the first block and doubles with each next block. Filters of the very first layer are applied with the stride x2 to reduce the dimension of the original image. Our experiments have shown that this step does not reduce the network performance but helps to speed up the convergence and meet the limitations of GPU memory. The blocks are separated by max-pooling layers. We also apply 3D-dropout after each pooling layer to promote independence between feature maps and reduce over-fitting [TGJ*+*15].
Next, feature maps extracted by the convolutional layers are fed into the fully connected layer with 1024 hidden units, batch-normalization, ReLU activation, and dropout regularization, and then to the final layer with a single unit without non-linearity.
It was previously shown that auxiliary tower backpropagates the classification loss earlier in the network, serving as an additional regularization mechanism [SLJ*+*15, SVI*+*16].
Therefore, the auxiliary output was added to the network to provide better training of the deeper layers. For this purpose, feature maps from intermediate layers are fed to the separate fully connected layer to produce another target prediction, which is then added to the main network output with adjusted weight. In this case, the output of the third block of convolutional layer was used to compute auxiliary prediction and average it with the main output with weights 0.4 and 0.6 respectively.
We estimate quality of the models by Mean Squared Error (MSE) between the predicted scores and the pre-residualized fluid intelligence scores. The models were selected by optimizing the MSE-loss with the Adam optimizer. The learning rate was set to 3e-5, batch size is 10 and each network was trained until the loss on validation set starts to increase.
To train the model we use multi-modal input data: brain scan data (T1-weighted imagery after preprocessing) and gray matter segmented brain masks. For each subject, two three-dimensional images were stacked as channels of a single image. We fed the resulted 3D image with two channels into the VoxCNN network as an input.
We use cross-validation to increase the model performance: we divide the training sample into two separate parts and two neural networks are trained with the same architecture on each part independently. Then for the validation subjects, an ensemble of these two models, defined as a weighted average of their predictions, is applied. Weights for averaging are determined based on the validation performance of each model (test predictions of the network that turned out to demonstrate lower MSE score on validation were set to larger weights). The number of layers, Stride and ReLU blocks position were adjusted correspondingly.
The train set consists of n = 3739 samples, the validation set – n = 415 samples, and the test set – n = 4515 samples.
The models were implemented in PyTorch and trained on a single GPU [CPC16].
4 Experimental results
In Table 1 represented deep neural network architectures used and corresponding results for fluid intelligence prediction. Here the brain morphemic characteristics predictive capacity is considered as a baseline for prediction.
The most accurate prediction (in terms of MSE on the validation set) was obtained as a weighted average of the two predictions by VoxCNN neural networks trained on different parts of the training sample:
VoxCNN network, trained on both brain T1 images and segmented images, 2. 2.
VoxCNN network (with auxiliary head for better convergence), trained on brain T1 images, segmented images and additional socio-demographic data. We used segmented brain masks and full brain imagery after pre-processing.
As a result, the first and the second network architectures showed and MSE scores on the Validation set. After averaging the predictions with adjusted weights and , the final validation performance reached MSE when using ensembles of models.
Then on the Test set the ensemble models yielded and MSE scores correspondingly.
5 Discussion
All constructed regression models provided MSE, which is equal approximately to . These results are comparable to the baseline result, calculated using morphological characteristics on the Validation set.
This incremental improvement and rather high errors across all models could potentially imply both the study design and the data inconsistency: the reason may be that structural T1-weighted images alone are not enough to predict fluid intelligence scores; at the same time brain functional data like fMRI might have more predictive power for cognitive assessment.
The top performing model was the combination (a weighted average prediction) of two VoxCNN neural networks trained on different parts of the training sample, highlighting the potential strength of the models’ ensembles yielded MSE on the Validation set and MSE on the Test set.
6 Conclusion
In our work for the first time ensembles of VoxCNN networks were applied to the 3D brain imagery regression task. According to the results of this architecture we could consider it as a consistent predictive tool for large datasets with heavy and multi-modal inputs.
Due to the rich structure of the considered dataset there is enough room for further improvements. A future work on the model hyperparameters optimization is needed in order to achieve better network convergence. We can use advanced approaches to initialization of neural network parameters [BE16] and construction of ensembles [BP13]. Sparse 3D convolutions could decrease memory requirements [NKB18].
Transfer learning and domain adaptation techniques could potentially show better performance [GMK*+*17, LZC*+*17, GWB*+*16]. Also we can utilize multi-fidelity approaches when solving the regression problem with multi-modal data [BZ15, ZB17a, ZB17b]. Conformal prediction framework [KBB18, BV14, BN16] is a ready-to-use tool to assess prediction uncertainty.
The considered problem was formulated in the scope of the Project “Machine Learning and Pattern Recognition for the development of diagnostic and clinical prognostic prediction tools in psychiatry, borderline mental disorders, and neurology” (a part of the Skoltech Biomedical Initiative program).
6.0.1 Acknowledgements
The work was supported by the Russian Science Foundation under Grant 19-41-04109.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[ATS 09] Brian B Avants, Nick Tustison, and Gang Song. Advanced normalization tools (ants). Insight j , 2:1–35, 2009.
- 2[ATW + 11] Brian B Avants, Nicholas J Tustison, Jue Wu, Philip A Cook, and James C Gee. An open source multivariate framework for n-tissue segmentation with evaluation on public data. Neuroinformatics , 9(4):381–400, 2011.
- 3[BBT + 15] Sandra A Brown, Ty Brumback, Kristin Tomlinson, Kevin Cummins, Wesley K Thompson, Bonnie J Nagel, Michael D De Bellis, Stephen R Hooper, Duncan B Clark, Tammy Chung, et al. The national consortium on alcohol and neurodevelopment in adolescence (ncanda): a multisite study of adolescent development and substance use. Journal of studies on alcohol and drugs , 76(6):895–908, 2015.
- 4[BE 16] E. Burnaev and P. Erofeev. The influence of parameter initialization on the training time and accuracy of a nonlinear regression model. Journal of Communications Technology and Electronics , 61(6):646–660, Jun 2016.
- 5[BN 16] E. Burnaev and I. Nazarov. Conformalized kernel ridge regression. In 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) , pages 45–52, 2016.
- 6[BP 13] E. V. Burnaev and P. V. Prikhod’ko. On a method for constructing ensembles of regression models. Automation and Remote Control , 74(10):1630–1644, Oct 2013.
- 7[BV 14] E. Burnaev and V. Vovk. Efficiency of conformalized ridge regression. In Maria Florina Balcan, Vitaly Feldman, and Csaba Szepesvari, editors, Proceedings of The 27th Conference on Learning Theory , volume 35 of Proceedings of Machine Learning Research , pages 605–622, Barcelona, Spain, 13–15 Jun 2014. PMLR.
- 8[BZ 15] E. Burnaev and A. Zaytsev. Surrogate modeling of multifidelity data for large samples. Journal of Communications Technology and Electronics , 60(12):1348–1355, 2015.
