# Convolutional Neural Networks for Classification of Alzheimer's Disease:   Overview and Reproducible Evaluation

**Authors:** Junhao Wen, Elina Thibeau-Sutre, Mauricio Diaz-Melo, Jorge, Samper-Gonzalez, Alexandre Routier, Simona Bottani, Didier Dormont, Stanley, Durrleman, Ninon Burgos, Olivier Colliot

arXiv: 1904.07773 · 2020-06-02

## TL;DR

This study reviews CNN-based methods for Alzheimer's classification from MRI, extends an open framework for reproducibility, and compares various CNN architectures with rigorous validation to ensure unbiased performance assessment.

## Contribution

It provides a systematic review highlighting data leakage issues, extends an open-source CNN framework, and conducts a rigorous comparison of CNN architectures using strict validation protocols.

## Key findings

- 3D approaches perform similarly, 2D slices perform worse
- CNNs do not outperform SVM with voxel features
- Models generalize well within similar populations but not across different datasets

## Abstract

Over 30 papers have proposed to use convolutional neural network (CNN) for AD classification from anatomical MRI. However, the classification performance is difficult to compare across studies due to variations in components such as participant selection, image preprocessing or validation procedure. Moreover, these studies are hardly reproducible because their frameworks are not publicly accessible and because implementation details are lacking. Lastly, some of these papers may report a biased performance due to inadequate or unclear validation or model selection procedures. In the present work, we aim to address these limitations through three main contributions. First, we performed a systematic literature review and found that more than half of the surveyed papers may have suffered from data leakage. Our second contribution is the extension of our open-source framework for classification of AD using CNN and T1-weighted MRI. Finally, we used this framework to rigorously compare different CNN architectures. The data was split into training/validation/test sets at the very beginning and only the training/validation sets were used for model selection. To avoid any overfitting, the test sets were left untouched until the end of the peer-review process. Overall, the different 3D approaches (3D-subject, 3D-ROI, 3D-patch) achieved similar performances while that of the 2D slice approach was lower. Of note, the different CNN approaches did not perform better than a SVM with voxel-based features. The different approaches generalized well to similar populations but not to datasets with different inclusion criteria or demographical characteristics.

---
Source: https://tomesphere.com/paper/1904.07773