A Step Towards Preserving Speakers' Identity While Detecting Depression Via Speaker Disentanglement
Vijay Ravi, Jinhan Wang, Jonathan Flint, Abeer Alwan

TL;DR
This paper introduces an adversarial disentanglement approach to improve depression detection from speech while preserving speaker identity, demonstrating effectiveness across multiple datasets and features.
Contribution
The study proposes a novel adversarial training method that disentangles depression features from speaker identity, enhancing classification accuracy and safeguarding speaker privacy.
Findings
Adversarial training improves depression classification across all features.
Wav2vec2.0 features with adversarial learning achieve the highest F1-scores.
Disentanglement reduces speaker discriminability while enhancing depression detection.
Abstract
Preserving a patient's identity is a challenge for automatic, speech-based diagnosis of mental health disorders. In this paper, we address this issue by proposing adversarial disentanglement of depression characteristics and speaker identity. The model used for depression classification is trained in a speaker-identity-invariant manner by minimizing depression prediction loss and maximizing speaker prediction loss during training. The effectiveness of the proposed method is demonstrated on two datasets - DAIC-WOZ (English) and CONVERGE (Mandarin), with three feature sets (Mel-spectrograms, raw-audio signals, and the last-hidden-state of Wav2vec2.0), using a modified DepAudioNet model. With adversarial training, depression classification improves for every feature when compared to the baseline. Wav2vec2.0 features with adversarial learning resulted in the best performance (F1-score of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Emotion and Mood Recognition · Voice and Speech Disorders
