# Cross-Modal Alignment and Rectified Flow-Based Latent Representation Synthesis for Enhanced Speech-Driven Alzheimer’s Disease Detection

**Authors:** Shu Xiang, Haobo Ling, Meihong Wu

PMC · DOI: 10.3390/bioengineering13030370 · Bioengineering · 2026-03-23

## TL;DR

This paper introduces a new method for detecting Alzheimer’s Disease using speech and EEG data, improving accuracy by aligning features and generating latent representations.

## Contribution

A novel framework combining cross-modal alignment and Rectified Flow for speech-driven Alzheimer’s detection with limited multimodal data.

## Key findings

- The method achieved 89.08% three-class classification accuracy using fused speech and latent EEG features.
- It outperformed speech-only baselines by 9.28% in accuracy, showing significant improvement.
- The approach effectively compensates for missing MCI data using latent space interpolation.

## Abstract

To address the limited accuracy of speech-based Alzheimer’s Disease (AD) screening and the shortage of paired multimodal data, this paper proposes a detection framework based on feature alignment and Rectified Flow-driven latent representation generation. The EEG dataset consists of 36 AD patients and 29 Healthy Controls (HC). The speech dataset contains 399 samples, which include 114 AD cases, 132 Mild Cognitive Impairment (MCI) cases, and 153 HC cases. We extracted multidimensional features of EEG signals, such as time-domain and frequency-domain characteristics, alongside behavioral representations of speech. A heterogeneous alignment network was used to map these features into a common semantic subspace, where an adaptive interpolation strategy reconstructed the missing pathological trajectories of MCI within the latent space. On this basis, a conditional Rectified Flow model was introduced to learn the optimal transport mapping from speech to EEG. This model generated physiological-information-rich latent representations to compensate for semantic gaps. Experimental results showed that the fused features from speech and latent representations achieved a three-class classification accuracy of 89.08%, a precision of 88.77%, and a recall of 88.71%. This performance represented an accuracy improvement of 9.28% compared with the speech-based baseline system. Our method combines the convenience of speech screening with the high reliability of neurophysiological signals, and it provides a new approach for low-cost early detection of AD.

## Linked entities

- **Diseases:** Alzheimer’s Disease (MONDO:0004975)

## Full-text entities

- **Diseases:** Cognitive Impairment (MESH:D003072), MCI (MESH:D060825), AD (MESH:D000544)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13024342/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13024342/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/PMC13024342/full.md

---
Source: https://tomesphere.com/paper/PMC13024342