Private Speech Classification without Collapse: Stabilized DP Training and Offline Distillation
Yadi Wen, Tianxin Li, Enji Liang, Rong Du, and Yue Fu

TL;DR
This paper introduces a new two-stage privacy-preserving speech classification method that stabilizes training and improves model robustness by distilling a private multimodal teacher into an audio-only student, addressing collapse issues under differential privacy.
Contribution
It proposes a novel two-stage protocol combining DP training and offline distillation to enhance privacy and stability in speech classification without model collapse.
Findings
DP training can cause collapse to single-class predictors in imbalanced tasks.
The proposed method stabilizes training and maintains privacy guarantees.
Offline distillation improves audio-only model performance and robustness.
Abstract
We study example-level private supervised speech classification under a practical release constraint: training may access privileged side information, but the released model must be audio-only. This setting is important because speech systems can often exploit richer side information during development, whereas deployment and release require a lightweight unimodal model with auditable privacy guarantees. Using DP-SGD on the private dataset , we identify a strong-privacy failure mode () on imbalanced tasks, where training may collapse to a near single-class predictor, a phenomenon that overall accuracy can obscure. We therefore emphasize Macro-F1, balanced accuracy, and a simple collapse diagnostic. This failure is especially problematic in our release setting because a collapsed private teacher cannot provide useful supervision for the downstream…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
