Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches
Kexin Feng, Theodora Chaspari

TL;DR
This paper presents vowel-based ensemble learning methods for depression detection from speech, emphasizing explainability and robustness, with approaches that decompose symptoms and severity for improved clinical utility.
Contribution
It introduces novel vowel-based embeddings and ensemble strategies that enhance explainability and robustness in depression classification from speech data.
Findings
Performance comparable to state-of-the-art baselines
Enhanced robustness against dataset mean/median variations
Improved system explainability for clinical use
Abstract
This study investigates explainable machine learning algorithms for identifying depression from speech. Grounded in evidence from speech production that depression affects motor control and vowel generation, pre-trained vowel-based embeddings, that integrate semantically meaningful linguistic units, are used. Following that, an ensemble learning approach decomposes the problem into constituent parts characterized by specific depression symptoms and severity levels. Two methods are explored: a "bottom-up" approach with 8 models predicting individual Patient Health Questionnaire-8 (PHQ-8) item scores, and a "top-down" approach using a Mixture of Experts (MoE) with a router module for assessing depression severity. Both methods depict performance comparable to state-of-the-art baselines, demonstrating robustness and reduced susceptibility to dataset mean/median values. System…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Sentiment Analysis and Opinion Mining
