Depression Scale Recognition from Audio, Visual and Text Analysis
Shubham Dham, Anirudh Sharma, Abhinav Dhall

TL;DR
This paper presents a multimodal machine learning approach combining audio, visual, and text features to automatically detect depression, achieving significant improvements over baseline methods on the DAIC-WOZ dataset.
Contribution
It introduces a novel multimodal feature extraction and fusion method for depression recognition using audio, visual, and text data with machine learning classifiers.
Findings
Crossed baseline performance by 17% on audio features.
Achieved 24.5% improvement on video features.
Demonstrated effectiveness of multimodal fusion for depression detection.
Abstract
Depression is a major mental health disorder that is rapidly affecting lives worldwide. Depression not only impacts emotional but also physical and psychological state of the person. Its symptoms include lack of interest in daily activities, feeling low, anxiety, frustration, loss of weight and even feeling of self-hatred. This report describes work done by us for Audio Visual Emotion Challenge (AVEC) 2017 during our second year BTech summer internship. With the increase in demand to detect depression automatically with the help of machine learning algorithms, we present our multimodal feature extraction and decision level fusion approach for the same. Features are extracted by processing on the provided Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) database. Gaussian Mixture Model (GMM) clustering and Fisher vector approach were applied on the visual data; statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · EEG and Brain-Computer Interfaces · Mental Health via Writing
