MINT: Multimodal Imaging-to-Speech Knowledge Transfer for Early Alzheimer's Screening

Vrushank Ahire; Yogesh Kumar; Anouck Girard; M. A. Ganaie

arXiv:2602.23994·cs.LG·March 2, 2026

MINT: Multimodal Imaging-to-Speech Knowledge Transfer for Early Alzheimer's Screening

Vrushank Ahire, Yogesh Kumar, Anouck Girard, M. A. Ganaie

PDF

Open Access

TL;DR

This paper introduces MINT, a novel framework that transfers neuroimaging biomarkers into speech representations, enabling early Alzheimer's screening without requiring MRI at inference, thus combining biological grounding with practical deployment.

Contribution

MINT is the first method to transfer MRI-derived biomarkers into speech models, improving early Alzheimer's detection and reducing reliance on costly neuroimaging during inference.

Findings

01

Speech aligned with MRI biomarkers achieves comparable accuracy to MRI-based classifiers.

02

Multimodal fusion enhances screening performance beyond MRI alone.

03

MRI-to-speech transfer enables biologically grounded, scalable Alzheimer's screening.

Abstract

Alzheimer's disease is a progressive neurodegenerative disorder in which mild cognitive impairment (MCI) marks a critical transition between aging and dementia. Neuroimaging modalities, such as structural MRI, provide biomarkers of this transition; however, their high costs and infrastructure needs limit their deployment at a population scale. Speech analysis offers a non-invasive alternative, but speech-only classifiers are developed independently of neuroimaging, leaving decision boundaries biologically ungrounded and limiting reliability on the subtle CN-versus-MCI distinction. We propose MINT (Multimodal Imaging-to-Speech Knowledge Transfer), a three-stage cross-modal framework that transfers biomarker structure from MRI into a speech encoder at training time. An MRI teacher, trained on 1,228 subjects, defines a compact neuroimaging embedding space for CN-versus-MCI classification.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Dementia and Cognitive Impairment Research