Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging Spectrotemporal Variations in Speech Vowels
Kexin Feng, Theodora Chaspari

TL;DR
This paper presents a knowledge-driven machine learning approach that leverages spectrotemporal vowel-level speech features to improve depression detection and enhance interpretability for clinical applications.
Contribution
It introduces a novel vowel-based spectrotemporal modeling framework combined with explainability methods for depression detection from speech.
Findings
Outperforms baseline models without vowel-level integration
Spectrotemporal vowel information is more impactful than non-vowel segments
Provides interpretable insights into temporal speech changes related to depression
Abstract
Psychomotor retardation associated with depression has been linked with tangible differences in vowel production. This paper investigates a knowledge-driven machine learning (ML) method that integrates spectrotemporal information of speech at the vowel-level to identify the depression. Low-level speech descriptors are learned by a convolutional neural network (CNN) that is trained for vowel classification. The temporal evolution of those low-level descriptors is modeled at the high-level within and across utterances via a long short-term memory (LSTM) model that takes the final depression decision. A modified version of the Local Interpretable Model-agnostic Explanations (LIME) is further used to identify the impact of the low-level spectrotemporal vowel variation on the decisions and observe the high-level temporal change of the depression likelihood. The proposed method outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders
