Predicting Depression Severity by Multi-Modal Feature Engineering and Fusion
Aven Samareh, Yan Jin, Zhangyang Wang, Xiangyu Chang, Shuai Huang

TL;DR
This paper introduces a multi-modal fusion model that combines audio, video, and text features to predict depression severity, demonstrating improved accuracy over single modalities on the AVEC 2017 dataset.
Contribution
The paper presents a novel multi-modal fusion approach for depression prediction using vocal, linguistic, and facial features, outperforming single modality models.
Findings
Multi-modal fusion outperforms individual modality models.
The model surpasses the dataset baseline with a significant margin.
Combining modalities improves depression severity prediction accuracy.
Abstract
We present our preliminary work to determine if patient's vocal acoustic, linguistic, and facial patterns could predict clinical ratings of depression severity, namely Patient Health Questionnaire depression scale (PHQ-8). We proposed a multi modal fusion model that combines three different modalities: audio, video , and text features. By training over AVEC 2017 data set, our proposed model outperforms each single modality prediction model, and surpasses the data set baseline with ice margin.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Mental Health via Writing · Sentiment Analysis and Opinion Mining
