TL;DR
This paper presents a deep learning approach trained on a large dataset to detect depression and anxiety from speech, achieving 71% accuracy and releasing the model for research use.
Contribution
It introduces a deep learning model trained on a proprietary large-scale speech dataset for mental health biomarker detection, improving predictive performance.
Findings
Models can extract content-agnostic biomarker information.
Combining biomarker and lexical features improves prediction.
Achieved 71% sensitivity and specificity on a large dataset.
Abstract
Current approaches to detecting depression and anxiety from speech primarily rely on machine learning techniques that utilize hand-engineered paralinguistic features and related acoustic descriptors derived from time- and frequency-domain representations of speech signals. Applying deep learning methods directly to raw speech signals has the potential to produce biomarker representations with substantially greater predictive power. However, these approaches typically require large volumes of carefully annotated data to learn robust and clinically meaningful representations of the underlying biomarkers. In this paper, we describe our efforts toward developing a deep learning model trained on a large-scale proprietary dataset comprising ~65,000 utterances collected from more than 23,000 subjects representative of relevant United States demographics. We present the techniques employed and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
