Optimizing Speech-Input Length for Speaker-Independent Depression Classification
Tomasz Rutowski, Amir Harati, Yang Lu, Elizabeth Shriberg

TL;DR
This study investigates how the length of speech inputs affects the accuracy of speaker-independent depression classification models, revealing optimal input lengths and strategies for improved health screening applications.
Contribution
It provides the first detailed analysis of speech input length impact on depression classification performance, guiding better design of speech-based health screening systems.
Findings
Performance depends on natural and elapsed response length.
A minimum input length threshold is necessary for accurate classification.
Better system performance correlates with higher response saturation thresholds.
Abstract
Machine learning models for speech-based depression classification offer promise for health care applications. Despite growing work on depression classification, little is understood about how the length of speech-input impacts model performance. We analyze results for speaker-independent depression classification using a corpus of over 1400 hours of speech from a human-machine health screening application. We examine performance as a function of response input length for two NLP systems that differ in overall performance. Results for both systems show that performance depends on natural length, elapsed length, and ordering of the response within a session. Systems share a minimum length threshold, but differ in a response saturation threshold, with the latter higher for the better system. At saturation it is better to pose a new question to the speaker, than to continue the current…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition
