Multi-modal fusion with gating using audio, lexical and disfluency   features for Alzheimer's Dementia recognition from spontaneous speech

Morteza Rohanian; Julian Hough; Matthew Purver

arXiv:2106.09668·cs.LG·June 18, 2021

Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer's Dementia recognition from spontaneous speech

Morteza Rohanian, Julian Hough, Matthew Purver

PDF

1 Repo

TL;DR

This paper presents a multi-modal fusion model using gating mechanisms to combine audio, lexical, and disfluency features for improved Alzheimer's Disease detection and severity prediction from spontaneous speech.

Contribution

It introduces a novel gating-based fusion approach that integrates unimodal LSTM decisions for better cognitive impairment assessment.

Findings

01

Model achieves promising results on ADReSS challenge datasets.

02

Disfluency features relate to cognitive impairment levels.

03

Sequence modeling effectively detects Alzheimer's from speech data.

Abstract

This paper is a submission to the Alzheimer's Dementia Recognition through Spontaneous Speech (ADReSS) challenge, which aims to develop methods that can assist in the automated prediction of severity of Alzheimer's Disease from speech data. We focus on acoustic and natural language features for cognitive impairment detection in spontaneous speech in the context of Alzheimer's Disease Diagnosis and the mini-mental state examination (MMSE) score prediction. We proposed a model that obtains unimodal decisions from different LSTMs, one for each modality of text and audio, and then combines them using a gating mechanism for the final prediction. We focused on sequential modelling of text and audio and investigated whether the disfluencies present in individuals' speech relate to the extent of their cognitive impairment. Our results show that the proposed classification and regression schemes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mortezaro/ad-recognition-from-speech
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.