DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection

Jiaxin Ye; Junping Zhang; Hongming Shan

arXiv:2409.15936·cs.CY·September 25, 2024

DepMamba: Progressive Fusion Mamba for Multimodal Depression Detection

Jiaxin Ye, Junping Zhang, Hongming Shan

PDF

Open Access 1 Repo

TL;DR

DepMamba introduces a hierarchical and progressive multimodal fusion approach for depression detection, effectively modeling long-range temporal dependencies and improving multimodal integration, leading to superior performance on large datasets.

Contribution

It proposes a novel hierarchical and progressive fusion framework combining SSM and CNNs for improved multimodal depression detection.

Findings

01

Outperforms existing methods on large-scale datasets

02

Effectively models long-range temporal dependencies

03

Enhances multimodal fusion accuracy

Abstract

Depression is a common mental disorder that affects millions of people worldwide. Although promising, current multimodal methods hinge on aligned or aggregated multimodal fusion, suffering two significant limitations: (i) inefficient long-range temporal modeling, and (ii) sub-optimal multimodal fusion between intermodal fusion and intramodal processing. In this paper, we propose an audio-visual progressive fusion Mamba for multimodal depression detection, termed DepMamba. DepMamba features two core designs: hierarchical contextual modeling and progressive multimodal fusion. On the one hand, hierarchical modeling introduces convolution neural networks and Mamba to extract the local-to-global features within long-range sequences. On the other hand, the progressive fusion first presents a multimodal collaborative State Space Model (SSM) extracting intermodal and intramodal information for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jiaxin-Ye/DepMamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition

MethodsConvolution · Mamba: Linear-Time Sequence Modeling with Selective State Spaces