Unsupervised Instance Discriminative Learning for Depression Detection   from Speech Signals

Jinhan Wang; Vijay Ravi; Jonathan Flint; Abeer Alwan

arXiv:2206.13016·eess.AS·June 28, 2022·Interspeech

Unsupervised Instance Discriminative Learning for Depression Detection from Speech Signals

Jinhan Wang, Vijay Ravi, Jonathan Flint, Abeer Alwan

PDF

Open Access

TL;DR

This paper introduces an unsupervised Instance Discriminative Learning approach for depression detection from speech signals, leveraging data augmentation and novel sampling strategies to improve embedding quality and detection accuracy across multiple languages.

Contribution

It proposes a modified IDL method with new sampling strategies and data augmentation techniques, enhancing depression detection from speech without requiring labeled data.

Findings

01

Pseudo Instance-based Sampling improves embedding spread-out characteristics.

02

Time-masking yields the best augmentation performance.

03

Significant detection improvements on DAIC-WOZ and CONVERGE datasets.

Abstract

Major Depressive Disorder (MDD) is a severe illness that affects millions of people, and it is critical to diagnose this disorder as early as possible. Detecting depression from voice signals can be of great help to physicians and can be done without any invasive procedure. Since relevant labelled data are scarce, we propose a modified Instance Discriminative Learning (IDL) method, an unsupervised pre-training technique, to extract augment-invariant and instance-spread-out embeddings. In terms of learning augment-invariant embeddings, various data augmentation methods for speech are investigated, and time-masking yields the best performance. To learn instance-spread-out embeddings, we explore methods for sampling instances for a training batch (distinct speaker-based and random sampling). It is found that the distinct speaker-based sampling provides better performance than the random…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Emotion and Mood Recognition