VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect
Qingyuan Fei, Wenjie Hou, Xuan Hai, Xin Liu

TL;DR
VocalCrypt introduces an active, imperceptible defense mechanism embedding pseudo-timbre into audio to prevent AI voice cloning, significantly improving robustness and speed over existing methods without degrading audio quality.
Contribution
This paper presents VocalCrypt, a novel preemptive defense technique embedding pseudo-timbre into audio, offering superior robustness and real-time performance against AI voice cloning.
Findings
Achieves 500% faster generation speed than existing methods
Maintains audio quality while effectively preventing voice cloning
Performs well in automatic speaker verification tests
Abstract
The rapid advancements in AI voice cloning, fueled by machine learning, have significantly impacted text-to-speech (TTS) and voice conversion (VC) fields. While these developments have led to notable progress, they have also raised concerns about the misuse of AI VC technology, causing economic losses and negative public perceptions. To address this challenge, this study focuses on creating active defense mechanisms against AI VC systems. We propose a novel active defense method, VocalCrypt, which embeds pseudo-timbre (jamming information) based on SFS into audio segments that are imperceptible to the human ear, thereby forming systematic fragments to prevent voice cloning. This approach protects the voice without compromising its quality. In comparison to existing methods, such as adversarial noise incorporation, VocalCrypt significantly enhances robustness and real-time performance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
