MINT: Multimodal Imaging-to-Speech Knowledge Transfer for Early Alzheimer's Screening
Vrushank Ahire, Yogesh Kumar, Anouck Girard, M. A. Ganaie

TL;DR
This paper introduces MINT, a novel framework that transfers neuroimaging biomarkers into speech representations, enabling early Alzheimer's screening without requiring MRI at inference, thus combining biological grounding with practical deployment.
Contribution
MINT is the first method to transfer MRI-derived biomarkers into speech models, improving early Alzheimer's detection and reducing reliance on costly neuroimaging during inference.
Findings
Speech aligned with MRI biomarkers achieves comparable accuracy to MRI-based classifiers.
Multimodal fusion enhances screening performance beyond MRI alone.
MRI-to-speech transfer enables biologically grounded, scalable Alzheimer's screening.
Abstract
Alzheimer's disease is a progressive neurodegenerative disorder in which mild cognitive impairment (MCI) marks a critical transition between aging and dementia. Neuroimaging modalities, such as structural MRI, provide biomarkers of this transition; however, their high costs and infrastructure needs limit their deployment at a population scale. Speech analysis offers a non-invasive alternative, but speech-only classifiers are developed independently of neuroimaging, leaving decision boundaries biologically ungrounded and limiting reliability on the subtle CN-versus-MCI distinction. We propose MINT (Multimodal Imaging-to-Speech Knowledge Transfer), a three-stage cross-modal framework that transfers biomarker structure from MRI into a speech encoder at training time. An MRI teacher, trained on 1,228 subjects, defines a compact neuroimaging embedding space for CN-versus-MCI classification.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Dementia and Cognitive Impairment Research
