Unsupervised Instance Discriminative Learning for Depression Detection from Speech Signals
Jinhan Wang, Vijay Ravi, Jonathan Flint, Abeer Alwan

TL;DR
This paper introduces an unsupervised Instance Discriminative Learning approach for depression detection from speech signals, leveraging data augmentation and novel sampling strategies to improve embedding quality and detection accuracy across multiple languages.
Contribution
It proposes a modified IDL method with new sampling strategies and data augmentation techniques, enhancing depression detection from speech without requiring labeled data.
Findings
Pseudo Instance-based Sampling improves embedding spread-out characteristics.
Time-masking yields the best augmentation performance.
Significant detection improvements on DAIC-WOZ and CONVERGE datasets.
Abstract
Major Depressive Disorder (MDD) is a severe illness that affects millions of people, and it is critical to diagnose this disorder as early as possible. Detecting depression from voice signals can be of great help to physicians and can be done without any invasive procedure. Since relevant labelled data are scarce, we propose a modified Instance Discriminative Learning (IDL) method, an unsupervised pre-training technique, to extract augment-invariant and instance-spread-out embeddings. In terms of learning augment-invariant embeddings, various data augmentation methods for speech are investigated, and time-masking yields the best performance. To learn instance-spread-out embeddings, we explore methods for sampling instances for a training batch (distinct speaker-based and random sampling). It is found that the distinct speaker-based sampling provides better performance than the random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Emotion and Mood Recognition
