VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking   Effect

Qingyuan Fei; Wenjie Hou; Xuan Hai; Xin Liu

arXiv:2502.10329·cs.SD·February 17, 2025

VocalCrypt: Novel Active Defense Against Deepfake Voice Based on Masking Effect

Qingyuan Fei, Wenjie Hou, Xuan Hai, Xin Liu

PDF

Open Access

TL;DR

VocalCrypt introduces an active, imperceptible defense mechanism embedding pseudo-timbre into audio to prevent AI voice cloning, significantly improving robustness and speed over existing methods without degrading audio quality.

Contribution

This paper presents VocalCrypt, a novel preemptive defense technique embedding pseudo-timbre into audio, offering superior robustness and real-time performance against AI voice cloning.

Findings

01

Achieves 500% faster generation speed than existing methods

02

Maintains audio quality while effectively preventing voice cloning

03

Performs well in automatic speaker verification tests

Abstract

The rapid advancements in AI voice cloning, fueled by machine learning, have significantly impacted text-to-speech (TTS) and voice conversion (VC) fields. While these developments have led to notable progress, they have also raised concerns about the misuse of AI VC technology, causing economic losses and negative public perceptions. To address this challenge, this study focuses on creating active defense mechanisms against AI VC systems. We propose a novel active defense method, VocalCrypt, which embeds pseudo-timbre (jamming information) based on SFS into audio segments that are imperceptible to the human ear, thereby forming systematic fragments to prevent voice cloning. This approach protects the voice without compromising its quality. In comparison to existing methods, such as adversarial noise incorporation, VocalCrypt significantly enhances robustness and real-time performance,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis