Speech-based Clinical Depression Screening: An Empirical Study
Yangbin Chen, Chenyang Xu, Chunfeng Liang, Yanbao Tao, Chuan Shi

TL;DR
This study demonstrates that speech signals, especially deep speech features, are effective markers for AI-based depression screening across various interaction scenarios, with human-computer interactions matching clinical interviews in accuracy.
Contribution
It provides an empirical evaluation of speech-based depression screening, highlighting the effectiveness of deep speech features and the impact of interaction scenarios on model performance.
Findings
Deep speech features outperform traditional acoustic features.
Human-computer interaction matches clinical interview accuracy.
Segment duration and quantity influence model performance.
Abstract
This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios, including psychiatric interviews, chatbot conversations, and text readings. Participants include depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital and control group members from the community, all diagnosed by psychiatrists following standardized diagnostic protocols. We extracted acoustic and deep speech features from each participant's segmented recordings. Classifications were made using neural networks or SVMs, with aggregated clip outcomes determining final assessments. Our analysis across interaction scenarios, speech processing techniques, and feature types confirms speech as a crucial marker for depression screening. Specifically, human-computer interaction matches clinical interview efficacy, surpassing reading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions
MethodsContrastive Language-Image Pre-training
