The INESC-ID Multi-Modal System for the ADReSS 2020 Challenge
Anna Pompili, Thomas Rolland, Alberto Abad

TL;DR
This paper presents a multi-modal system combining acoustic and textual features for Alzheimer's detection, achieving 81.25% accuracy, highlighting the importance of linguistic features over acoustic ones.
Contribution
The paper introduces a novel multi-modal classification framework that integrates acoustic and textual embeddings for Alzheimer's detection in the ADReSS 2020 challenge.
Findings
Linguistic features outperform acoustic features in classification accuracy.
Combining linguistic and acoustic features does not significantly improve performance.
The system achieved an accuracy of 81.25% in the challenge.
Abstract
This paper describes a multi-modal approach for the automatic detection of Alzheimer's disease proposed in the context of the INESC-ID Human Language Technology Laboratory participation in the ADReSS 2020 challenge. Our classification framework takes advantage of both acoustic and textual feature embeddings, which are extracted independently and later combined. Speech signals are encoded into acoustic features using DNN speaker embeddings extracted from pre-trained models. For textual input, contextual embedding vectors are first extracted using an English Bert model and then used either to directly compute sentence embeddings or to feed a bidirectional LSTM-RNNs with attention. Finally, an SVM classifier with linear kernel is used for the individual evaluation of the three systems. Our best system, based on the combination of linguistic and acoustic information, attained a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders
