A knowledge-driven vowel-based approach of depression classification from speech using data augmentation
Kexin Feng, Theodora Chaspari

TL;DR
This paper introduces an explainable, vowel-based deep learning approach for depression detection from speech, utilizing data augmentation and modeling temporal dependencies at multiple granularities, achieving competitive results.
Contribution
The study presents a novel vowel-level embedding method with data augmentation for depression classification, enhancing interpretability and temporal modeling in speech analysis.
Findings
Achieves comparable performance to state-of-the-art methods
Provides explainable insights into depression detection
Effective across various temporal granularities
Abstract
We propose a novel explainable machine learning (ML) model that identifies depression from speech, by modeling the temporal dependencies across utterances and utilizing the spectrotemporal information at the vowel level. Our method first models the variable-length utterances at the local-level into a fixed-size vowel-based embedding using a convolutional neural network with a spatial pyramid pooling layer ("vowel CNN"). Following that, the depression is classified at the global-level from a group of vowel CNN embeddings that serve as the input of another 1D CNN ("depression CNN"). Different data augmentation methods are designed for both the training of vowel CNN and depression CNN. We investigate the performance of the proposed system at various temporal granularities when modeling short, medium, and long analysis windows, corresponding to 10, 21, and 42 utterances, respectively. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Mental Health via Writing · Speech Recognition and Synthesis
MethodsSpatial Pyramid Pooling · 1-Dimensional Convolutional Neural Networks
