CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning

Renyuan Li; Zhibo Liang; Haichuan Zhang; Tianyu Shi; Zhiyuan Cheng; Jia Shi; Carl Yang; Mingjie Tang

arXiv:2505.19119·cs.SD·May 27, 2025

CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning

Renyuan Li, Zhibo Liang, Haichuan Zhang, Tianyu Shi, Zhiyuan Cheng, Jia Shi, Carl Yang, Mingjie Tang

PDF

TL;DR

CloneShield is a universal adversarial perturbation framework that effectively protects against zero-shot voice cloning by degrading cloned speech quality while preserving the original audio perceptibility.

Contribution

We propose a novel multi-objective optimization approach with MGDA for robust, imperceptible adversarial perturbations against zero-shot voice cloning systems.

Findings

01

Significantly degrades speaker similarity in cloned speech.

02

Maintains high audio quality for protected inputs (PESQ=3.90).

03

Effective across multiple TTS systems and datasets.

Abstract

Recent breakthroughs in text-to-speech (TTS) voice cloning have raised serious privacy concerns, allowing highly accurate vocal identity replication from just a few seconds of reference audio, while retaining the speaker's vocal authenticity. In this paper, we introduce CloneShield, a universal time-domain adversarial perturbation framework specifically designed to defend against zero-shot voice cloning. Our method provides protection that is robust across speakers and utterances, without requiring any prior knowledge of the synthesized text. We formulate perturbation generation as a multi-objective optimization problem, and propose Multi-Gradient Descent Algorithm (MGDA) to ensure the robust protection across diverse utterances. To preserve natural auditory perception for users, we decompose the adversarial perturbation via Mel-spectrogram representations and fine-tune it for each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSticker Response Selector