Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection
Yilin Pan, Yanpei Shi, Yijia Zhang, Mingyu Lu

TL;DR
Swin-BERT is a novel speech-based system combining acoustic and linguistic features, designed for early Alzheimer's dementia detection, effectively decoupling age and gender influences to improve accuracy.
Contribution
The paper introduces Swin-BERT, a feature fusion system that integrates acoustic and linguistic information with age and gender decoupling for improved dementia detection.
Findings
Achieved 85.58% F-score on ADReSS dataset.
Achieved 87.32% F-score on ADReSSo dataset.
Outperformed previous methods on both datasets.
Abstract
Speech is usually used for constructing an automatic Alzheimer's dementia (AD) detection system, as the acoustic and linguistic abilities show a decline in people living with AD at the early stages. However, speech includes not only AD-related local and global information but also other information unrelated to cognitive status, such as age and gender. In this paper, we propose a speech-based system named Swin-BERT for automatic dementia detection. For the acoustic part, the shifted windows multi-head attention that proposed to extract local and global information from images, is used for designing our acoustic-based system. To decouple the effect of age and gender on acoustic feature extraction, they are used as an extra input of the designed acoustic system. For the linguistic part, the rhythm-related information, which varies significantly between people living with and without AD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention
