Layer-Aware Early Fusion of Acoustic and Linguistic Embeddings for Cognitive Status Classification
Krystof Novotny, Laureano Moro-Vel\'azquez, Jiri Mekyska

TL;DR
This study explores how layer-aware early fusion of acoustic and linguistic embeddings from speech and transcription improves cognitive status classification, revealing optimal layer combinations and the importance of modality-specific fusion strategies.
Contribution
It introduces a layer-aware early fusion approach that enhances multimodal speech analysis for cognitive decline detection, emphasizing the significance of internal layer selection.
Findings
Mid-layer embeddings (layers 8-10) yield peak performance.
Acoustic-only models outperform text-only models.
Early fusion enhances acoustic modality discrimination.
Abstract
Speech contains both acoustic and linguistic patterns that reflect cognitive decline, and therefore models describing only one domain cannot fully capture such complexity. This study investigates how early fusion (EF) of speech and its corresponding transcription text embeddings, with attention to encoder layer depth, can improve cognitive status classification. Using a DementiaBank-derived collection of recordings (1,629 speakers; cognitively normal controlsCN, Mild Cognitive ImpairmentMCI, and Alzheimer's Disease and Related DementiasADRD), we extracted frame-aligned embeddings from different internal layers of wav2vec 2.0 or Whisper combined with DistilBERT or RoBERTa. Unimodal, EF and late fusion (LF) models were trained with a transformer classifier, optimized, and then evaluated across 10 seeds. Performance consistently peaked in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Voice and Speech Disorders · Dementia and Cognitive Impairment Research
