Deep Learning-based F0 Synthesis for Speaker Anonymization
\"Unal Ege Gaznepoglu, Nils Peters

TL;DR
This paper introduces a deep learning method to synthesize F0 trajectories for speaker anonymization, improving privacy and speech quality by better modifying prosody features.
Contribution
The paper presents a novel F0 synthesis approach that enhances speaker anonymization by accurately reconstructing F0 from other speech features, addressing limitations of existing methods.
Findings
Improved speaker anonymity as measured by equal error rate.
Enhanced speech utility indicated by lower word error rate.
Effective F0 reconstruction from non-F0 features.
Abstract
Voice conversion for speaker anonymization is an emerging concept for privacy protection. In a deep learning setting, this is achieved by extracting multiple features from speech, altering the speaker identity, and waveform synthesis. However, many existing systems do not modify fundamental frequency (F0) trajectories, which convey prosody information and can reveal speaker identity. Moreover, mismatch between F0 and other features can degrade speech quality and intelligibility. In this paper, we formally introduce a method that synthesizes F0 trajectories from other speech features and evaluate its reconstructional capabilities. Then we test our approach within a speaker anonymization framework, comparing it to a baseline and a state-of-the-art F0 modification that utilizes speaker information. The results show that our method improves both speaker anonymity, measured by the equal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders
