TL;DR
This paper introduces Pivotal Objective Perturbation (POP), a novel method that applies imperceptible noises to speech data to prevent high-quality voice synthesis, enhancing privacy and security against malicious deepfake generation.
Contribution
The paper proposes a transferable, robust protection technique called POP that effectively impedes TTS models from generating realistic deepfake speech using protected samples.
Findings
POP significantly increases voice synthesis difficulty, raising the unclarity score from 21.94% to 127.31%.
The method demonstrates strong transferability across various TTS models.
POP remains effective against noise reduction and data augmentation techniques.
Abstract
With just a few speech samples, it is possible to perfectly replicate a speaker's voice in recent years, while malicious voice exploitation (e.g., telecom fraud for illegal financial gain) has brought huge hazards in our daily lives. Therefore, it is crucial to protect publicly accessible speech data that contains sensitive information, such as personal voiceprints. Most previous defense methods have focused on spoofing speaker verification systems in timbre similarity but the synthesized deepfake speech is still of high quality. In response to the rising hazards, we devise an effective, transferable, and robust proactive protection technology named Pivotal Objective Perturbation (POP) that applies imperceptible error-minimizing noises on original speech samples to prevent them from being effectively learned for text-to-speech (TTS) synthesis models so that high-quality deepfake…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
