Reference-free Adversarial Sex Obfuscation in Speech
Yangyang Qu, Michele Panariello, Massimiliano Todisco, Nicholas Evans

TL;DR
This paper presents RASO, a novel reference-free method for obfuscating speaker sex in speech by disentangling linguistic content from sex-specific cues using adversarial learning and regularization, enhancing privacy without sacrificing content quality.
Contribution
Introduces RASO, a new adversarial learning framework for sex obfuscation in speech that does not require reference samples and effectively removes sex cues while preserving linguistic content.
Findings
RASO outperforms existing sex obfuscation methods under semi-informed attack models.
It effectively aligns fundamental frequency and formant trajectories to sex-neutral distributions.
RASO maintains linguistic content integrity during obfuscation.
Abstract
Sex conversion in speech involves privacy risks from data collection and often leaves residual sex-specific cues in outputs, even when target speaker references are unavailable. We introduce RASO for Reference-free Adversarial Sex Obfuscation. Innovations include a sex-conditional adversarial learning framework to disentangle linguistic content from sex-related acoustic markers and explicit regularisation to align fundamental frequency distributions and formant trajectories with sex-neutral characteristics learned from sex-balanced training data. RASO preserves linguistic content and, even when assessed under a semi-informed attack model, it significantly outperforms a competing approach to sex obfuscation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Speech Recognition and Synthesis · Hate Speech and Cyberbullying Detection
