Multi-modal deep learning system for depression and anxiety detection
Brian Diep, Marija Stanojevic, Jekaterina Novikova

TL;DR
This paper introduces a multi-modal deep learning system that combines audio, text, and hand-crafted features to improve the detection of depression and anxiety from speech, advancing digital mental health screening.
Contribution
It presents a novel multi-modal model integrating deep-learned and hand-crafted features for depression and anxiety detection from speech tasks.
Findings
Augmenting hand-crafted features with deep-learned features improves F1 scores.
Speech-based biomarkers show promise for digital mental health screening.
The system outperforms baseline models using only hand-crafted features.
Abstract
Traditional screening practices for anxiety and depression pose an impediment to monitoring and treating these conditions effectively. However, recent advances in NLP and speech modelling allow textual, acoustic, and hand-crafted language-based features to jointly form the basis of future mental health screening and condition detection. Speech is a rich and readily available source of insight into an individual's cognitive state and by leveraging different aspects of speech, we can develop new digital biomarkers for depression and anxiety. To this end, we propose a multi-modal system for the screening of depression and anxiety from self-administered speech tasks. The proposed model integrates deep-learned features from audio and text, as well as hand-crafted features that are informed by clinically-validated domain knowledge. We find that augmenting hand-crafted features with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Digital Mental Health Interventions · Emotion and Mood Recognition
