Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals
Jinhan Wang, Vijay Ravi, Abeer Alwan

TL;DR
This paper introduces a novel non-uniform adversarial speaker disentanglement method that enhances depression detection accuracy from raw speech signals while significantly reducing speaker-identification information, promoting privacy-preserving diagnostics.
Contribution
It proposes a non-uniform adversarial loss mechanism that improves depression detection performance and reduces speaker identity leakage in speech-based models.
Findings
Achieved an F1-score of 0.7349 on DAIC-WoZ, a 3.7% improvement over state-of-the-art.
Reduced speaker-identification accuracy by 50%, enhancing privacy.
Demonstrated the effectiveness of varying adversarial weights across model layers.
Abstract
While speech-based depression detection methods that use speaker-identity features, such as speaker embeddings, are popular, they often compromise patient privacy. To address this issue, we propose a speaker disentanglement method that utilizes a non-uniform mechanism of adversarial SID loss maximization. This is achieved by varying the adversarial weight between different layers of a model during training. We find that a greater adversarial weight for the initial layers leads to performance improvement. Our approach using the ECAPA-TDNN model achieves an F1-score of 0.7349 (a 3.7% improvement over audio-only SOTA) on the DAIC-WoZ dataset, while simultaneously reducing the speaker-identification accuracy by 50%. Our findings suggest that identifying depression through speech signals can be accomplished without placing undue reliance on a speaker's identity, paving the way for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Mental Health via Writing
