Non-uniform Speaker Disentanglement For Depression Detection From Raw   Speech Signals

Jinhan Wang; Vijay Ravi; Abeer Alwan

arXiv:2306.01861·eess.AS·June 7, 2023·Interspeech·1 cites

Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals

Jinhan Wang, Vijay Ravi, Abeer Alwan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel non-uniform adversarial speaker disentanglement method that enhances depression detection accuracy from raw speech signals while significantly reducing speaker-identification information, promoting privacy-preserving diagnostics.

Contribution

It proposes a non-uniform adversarial loss mechanism that improves depression detection performance and reduces speaker identity leakage in speech-based models.

Findings

01

Achieved an F1-score of 0.7349 on DAIC-WoZ, a 3.7% improvement over state-of-the-art.

02

Reduced speaker-identification accuracy by 50%, enhancing privacy.

03

Demonstrated the effectiveness of varying adversarial weights across model layers.

Abstract

While speech-based depression detection methods that use speaker-identity features, such as speaker embeddings, are popular, they often compromise patient privacy. To address this issue, we propose a speaker disentanglement method that utilizes a non-uniform mechanism of adversarial SID loss maximization. This is achieved by varying the adversarial weight between different layers of a model during training. We find that a greater adversarial weight for the initial layers leads to performance improvement. Our approach using the ECAPA-TDNN model achieves an F1-score of 0.7349 (a 3.7% improvement over audio-only SOTA) on the DAIC-WoZ dataset, while simultaneously reducing the speaker-identification accuracy by 50%. Our findings suggest that identifying depression through speech signals can be accomplished without placing undue reliance on a speaker's identity, paving the way for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kingformatty/NUSD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Mental Health via Writing