VocalBridge: Latent Diffusion-Bridge Purification for Defeating Perturbation-Based Voiceprint Defenses

Maryam Abbasihafshejani; AHM Nazmus Sakib; Murtuza Jadliwala

arXiv:2601.02444·cs.SD·January 7, 2026

VocalBridge: Latent Diffusion-Bridge Purification for Defeating Perturbation-Based Voiceprint Defenses

Maryam Abbasihafshejani, AHM Nazmus Sakib, Murtuza Jadliwala

PDF

Open Access

TL;DR

This paper introduces VocalBridge, a latent diffusion-based purification method that effectively removes protective perturbations from speech, exposing vulnerabilities in current voiceprint defenses against cloning and verification attacks.

Contribution

VocalBridge is a novel latent diffusion framework that purifies perturbed speech in the EnCodec latent space, improving robustness over existing defenses without requiring transcripts.

Findings

01

Outperforms existing purification methods in recovering cloneable voices

02

Demonstrates fragility of current perturbation-based defenses

03

Highlights need for more robust voice protection mechanisms

Abstract

The rapid advancement of speech synthesis technologies, including text-to-speech (TTS) and voice conversion (VC), has intensified security and privacy concerns related to voice cloning. Recent defenses attempt to prevent unauthorized cloning by embedding protective perturbations into speech to obscure speaker identity while maintaining intelligibility. However, adversaries can apply advanced purification techniques to remove these perturbations, recover authentic acoustic characteristics, and regenerate cloneable voices. Despite the growing realism of such attacks, the robustness of existing defenses under adaptive purification remains insufficiently studied. Most existing purification methods are designed to counter adversarial noise in automatic speech recognition (ASR) systems rather than speaker verification or voice cloning pipelines. As a result, they fail to suppress the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Speech Recognition and Synthesis · Topic Modeling