# A Machine Learning Pipeline for Prognostic Modeling of Alzheimer’s Disease Using Multimodal Data

**Authors:** Luisa De Palma, Vito Ivano D’Alessandro, Filippo Attivissimo, Anna Maria Lucia Lanzolla, Emilio Merlo Pich, Attilio Di Nisio

PMC · DOI: 10.3390/s26051523 · Sensors (Basel, Switzerland) · 2026-02-28

## TL;DR

This paper introduces a machine learning pipeline that predicts Alzheimer’s disease progression using multimodal data, achieving high accuracy with a small number of features.

## Contribution

A novel survival analysis pipeline integrating multimodal data for Alzheimer’s prognosis with robust handling of missing values and cross-site variability.

## Key findings

- A concordance index of 0.92 was achieved using 13 features from multimodal data.
- High predictive performance (C-index of 0.90) was maintained even with only 4 features.
- Inclusion of underexplored biomarkers like lipid metabolites improved prognostic modeling.

## Abstract

Accurate prediction of progression to Alzheimer’s disease (AD) is crucial for early intervention and personalized patient management. In this study, we developed a robust, data-driven survival analysis pipeline to model time-to-progression from cognitively normal (CN) and mild cognitive impairment (MCI) at baseline to AD, integrating cognitive, clinical, MRI and PET neuroimaging biomarkers, and biospecimen features from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. The ADNI cohort can be regarded as a multi-center platform for multimodal data integration that jointly captures cognitive performance, MRI/PET imaging-sensor biomarkers, and biofluid biosensing assays within a unified prognostic framework. Accordingly, our pipeline is designed to be robust to cross-site and cross-instrument variability through harmonized preprocessing and quality-check aware integration of heterogeneous multimodal data. Indeed, we employed eXtreme Gradient Boosting (XGBoost) for predicting survival data, which allows for the native handling of missing values that are frequently observed in real-world clinical datasets. Our results confirm that strong predictive performance can be achieved using a minimal set of features, obtaining a concordance index (C-index) of 0.92 using 13 features and 0.90 using only 4 features. These findings underscore the importance of multi-domain feature integration, transparent feature selection, and the inclusion of underexplored biomarkers such as lipid metabolites for prognostic modeling.

## Linked entities

- **Diseases:** Alzheimer’s disease (MONDO:0004975)

## Full-text entities

- **Diseases:** cognitive impairment (MESH:D003072), MCI (MESH:D060825), AD (MESH:D000544)
- **Chemicals:** lipid (MESH:D008055)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12987005/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12987005/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12987005/full.md

---
Source: https://tomesphere.com/paper/PMC12987005