Attention-Based Acoustic Feature Fusion Network for Depression Detection
Xiao Xu, Yang Wang, Xinru Wei, Fei Wang, Xizhe Zhang

TL;DR
This paper introduces ABAFnet, an innovative deep learning model that fuses multiple acoustic speech features using an attention mechanism to improve depression detection accuracy from clinical speech data.
Contribution
The study proposes a novel attention-based acoustic feature fusion network with a weight adjustment module for enhanced depression detection performance.
Findings
Outperforms previous methods on clinical speech databases
Highlights the importance of MFCC features in depression detection
Effective integration of multi-tiered acoustic features
Abstract
Depression, a common mental disorder, significantly influences individuals and imposes considerable societal impacts. The complexity and heterogeneity of the disorder necessitate prompt and effective detection, which nonetheless, poses a difficult challenge. This situation highlights an urgent requirement for improved detection methods. Exploiting auditory data through advanced machine learning paradigms presents promising research directions. Yet, existing techniques mainly rely on single-dimensional feature models, potentially neglecting the abundance of information hidden in various speech characteristics. To rectify this, we present the novel Attention-Based Acoustic Feature Fusion Network (ABAFnet) for depression detection. ABAFnet combines four different acoustic features into a comprehensive deep learning model, thereby effectively integrating and blending multi-tiered features.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Voice and Speech Disorders · Speech Recognition and Synthesis
