A Multimodal Approach for Dementia Detection from Spontaneous Speech with Tensor Fusion Layer
Loukas Ilias, Dimitris Askounis, John Psarras

TL;DR
This paper introduces a multimodal deep learning framework utilizing tensor fusion and transformer-based models to improve early detection of Alzheimer's disease from spontaneous speech, achieving state-of-the-art accuracy.
Contribution
It proposes an end-to-end neural network architecture that captures inter- and intra-modal interactions using a tensor fusion layer and transformer models for multimodal dementia detection.
Findings
Achieves up to 86.25% accuracy and 85.48% F1-score.
Outperforms existing multimodal approaches.
Effectively models inter-modal interactions with tensor fusion.
Abstract
Alzheimer's disease (AD) is a progressive neurological disorder, meaning that the symptoms develop gradually throughout the years. It is also the main cause of dementia, which affects memory, thinking skills, and mental abilities. Nowadays, researchers have moved their interest towards AD detection from spontaneous speech, since it constitutes a time-effective procedure. However, existing state-of-the-art works proposing multimodal approaches do not take into consideration the inter- and intra-modal interactions and propose early and late fusion approaches. To tackle these limitations, we propose deep neural networks, which can be trained in an end-to-end trainable way and capture the inter- and intra-modal interactions. Firstly, each audio file is converted to an image consisting of three channels, i.e., log-Mel spectrogram, delta, and delta-delta. Next, each transcript is passed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Linear Layer · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Residual Connection · Dropout · Attention Dropout · Label Smoothing
