Multimodal Magic Elevating Depression Detection with a Fusion of Text and Audio Intelligence
Lindy Gan, Yifan Huang, Xiaoyang Gao, Jiaming Tan, Fujun Zhao, Tao, Yang

TL;DR
This paper introduces a multimodal fusion model using teacher-student architecture with multi-head attention for improved depression detection accuracy from text and audio data, outperforming traditional methods.
Contribution
The study presents a novel multimodal fusion framework with multi-head attention and weighted transfer learning, significantly enhancing depression classification performance.
Findings
Achieved an F1 score of 99.1% on DAIC-WOZ dataset.
Outperformed unimodal and conventional multimodal approaches.
Demonstrated robustness and adaptability in complex data scenarios.
Abstract
This study proposes an innovative multimodal fusion model based on a teacher-student architecture to enhance the accuracy of depression classification. Our designed model addresses the limitations of traditional methods in feature fusion and modality weight allocation by introducing multi-head attention mechanisms and weighted multimodal transfer learning. Leveraging the DAIC-WOZ dataset, the student fusion model, guided by textual and auditory teacher models, achieves significant improvements in classification accuracy. Ablation experiments demonstrate that the proposed model attains an F1 score of 99. 1% on the test set, significantly outperforming unimodal and conventional approaches. Our method effectively captures the complementarity between textual and audio features while dynamically adjusting the contributions of the teacher models to enhance generalization capabilities. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPharmacy and Medical Practices · Advanced Text Analysis Techniques · Diverse Approaches in Healthcare and Education Studies
MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention
