Robust Speech and Natural Language Processing Models for Depression Screening
Y. Lu, A. Harati, T. Rutowski, R. Oliveira, P. Chlebek, E. Shriberg

TL;DR
This paper presents two deep learning models, one acoustic and one natural language processing-based, for depression screening using conversational speech, demonstrating robust performance across diverse speakers and sessions with AUC scores above 0.80.
Contribution
The paper introduces two transfer learning-based models for depression detection from speech and language, showing their robustness and potential for automated screening.
Findings
Both models achieve AUC ≥ 0.80 on unseen data.
Models are robust across different speakers and sessions.
Transfer learning enhances model performance.
Abstract
Depression is a global health concern with a critical need for increased patient screening. Speech technology offers advantages for remote screening but must perform robustly across patients. We have described two deep learning models developed for this purpose. One model is based on acoustics; the other is based on natural language processing. Both models employ transfer learning. Data from a depression-labeled corpus in which 11,000 unique users interacted with a human-machine application using conversational speech is used. Results on binary depression classification have shown that both models perform at or above AUC=0.80 on unseen data with no speaker overlap. Performance is further analyzed as a function of test subset characteristics, finding that the models are generally robust over speaker and session variables. We conclude that models based on these approaches offer promise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing
