# A Machine Learning Framework for Cognitive Impairment Screening from Speech with Multimodal Large Models

**Authors:** Shiyu Chen, Ying Tan, Wenyu Hu, Yingxi Chen, Lihua Chen, Yurou He, Weihua Yu, Yang Lü

PMC · DOI: 10.3390/bioengineering13010073 · Bioengineering · 2026-01-08

## TL;DR

This paper introduces a machine learning framework that uses speech analysis to screen for cognitive impairments like Alzheimer's disease in a non-invasive and scalable way.

## Contribution

The novel framework combines a pre-trained multimodal model with speech tasks and machine learning for early Alzheimer's detection.

## Key findings

- LightGBM and Gradient Boosting classifiers achieved an average AUC of 0.9501 in classifying cognitive states.
- Speech features like spectral complexity and energy dynamics were most important in distinguishing cognitive impairments.
- The framework is suitable for clinical and telemedicine applications.

## Abstract

Background: Early diagnosis of Alzheimer’s disease (AD) is essential for slowing disease progression and mitigating cognitive decline. However, conventional diagnostic methods are often invasive, time-consuming, and costly, limiting their utility in large-scale screening. There is an urgent need for scalable, non-invasive, and accessible screening tools. Methods: We propose a novel screening framework combining a pre-trained multimodal large language model with structured MMSE speech tasks. An artificial intelligence-assisted multilingual Mini-Mental State Examination system (AAM-MMSE) was utilized to collect voice data from 1098 participants in Sichuan and Chongqing. CosyVoice2 was used to extract speaker embeddings, speech labels, and acoustic features, which were converted into statistical representations. Fourteen machine learning models were developed for subject classification into three diagnostic categories: Healthy Control (HC), Mild Cognitive Impairment (MCI), and Alzheimer’s Disease (AD). SHAP analysis was employed to assess the importance of the extracted speech features. Results: Among the evaluated models, LightGBM and Gradient Boosting classifiers exhibited the highest performance, achieving an average AUC of 0.9501 across classification tasks. SHAP-based analysis revealed that spectral complexity, energy dynamics, and temporal features were the most influential in distinguishing cognitive states, aligning with known speech impairments in early-stage AD. Conclusions: This framework offers a non-invasive, interpretable, and scalable solution for cognitive screening. It is suitable for both clinical and telemedicine applications, demonstrating the potential of speech-based AI models in early AD detection.

## Linked entities

- **Diseases:** Alzheimer’s disease (MONDO:0004975), Alzheimer’s Disease (MONDO:0004975)

## Full-text entities

- **Diseases:** Cognitive Impairment (MESH:D003072), AD (MESH:D000544), speech impairments (MESH:D013064), MCI (MESH:D060825)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12837762/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12837762/full.md

## References

61 references — full list in the complete paper: https://tomesphere.com/paper/PMC12837762/full.md

---
Source: https://tomesphere.com/paper/PMC12837762