Multi-modal Depression Estimation based on Sub-attentional Fusion
Ping-Cheng Wei, Kunyu Peng, Alina Roitberg, Kailun Yang, Jiaming, Zhang, Rainer Stiefelhagen

TL;DR
This paper introduces a novel multi-modal depression estimation method using a sub-attention mechanism and convolutional bidirectional LSTM, achieving high accuracy and efficiency on the DAIC-WOZ benchmark.
Contribution
It proposes a new sub-attention based fusion approach for multi-modal depression detection, outperforming traditional methods and reducing preprocessing complexity.
Findings
Achieved 0.89 precision and 0.70 F1-score in depression detection.
Obtained 4.92 MAE in depression severity estimation.
Outperformed conventional late fusion approaches.
Abstract
Failure to timely diagnose and effectively treat depression leads to over 280 million people suffering from this psychological disorder worldwide. The information cues of depression can be harvested from diverse heterogeneous resources, e.g., audio, visual, and textual data, raising demand for new effective multi-modal fusion approaches for automatic estimation. In this work, we tackle the task of automatically identifying depression from multi-modal data and introduce a sub-attention mechanism for linking heterogeneous information while leveraging Convolutional Bidirectional LSTM as our backbone. To validate this idea, we conduct extensive experiments on the public DAIC-WOZ benchmark for depression assessment featuring different evaluation modes and taking gender-specific biases into account. The proposed model yields effective results with 0.89 precision and 0.70 F1-score in detecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Mental Health via Writing · Sentiment Analysis and Opinion Mining
MethodsMasked autoencoder · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
