DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection
Yuxin Li, Xiangyu Zhang, Yifei Li, Zhiwei Guo, Haoyang Zhang, Eng Siong Chng, Cuntai Guan

TL;DR
DepFlow is a novel speech synthesis framework that disentangles depression-related features to generate controllable, bias-mitigated speech data, improving depression detection robustness and enabling ethical clinical simulations.
Contribution
We introduce DepFlow, a three-stage disentangled speech generation model that controls depressive severity and mitigates semantic bias in depression detection datasets.
Findings
DepFlow achieves effective disentanglement with ROC-AUC of 0.693.
CDoA dataset improves depression detection macro-F1 by up to 12%.
DepFlow outperforms conventional augmentation in robustness and interpretability.
Abstract
Speech is a scalable and non-invasive biomarker for early mental health screening. However, widely used depression datasets like DAIC-WOZ exhibit strong coupling between linguistic sentiment and diagnostic labels, encouraging models to learn semantic shortcuts. As a result, model robustness may be compromised in real-world scenarios, such as Camouflaged Depression, where individuals maintain socially positive or neutral language despite underlying depressive states. To mitigate this semantic bias, we propose DepFlow, a three-stage depression-conditioned text-to-speech framework. First, a Depression Acoustic Encoder learns speaker- and content-invariant depression embeddings through adversarial training, achieving effective disentanglement while preserving depression discriminability (ROC-AUC: 0.693). Second, a flow-matching TTS model with FiLM modulation injects these embeddings into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Emotion and Mood Recognition · Digital Mental Health Interventions
