RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations

Seungmin Kim; Sohee Park; Donghyun Kim; Jisu Lee; Daeseon Choi

arXiv:2505.12686·cs.LG·May 20, 2025

RoVo: Robust Voice Protection Against Unauthorized Speech Synthesis with Embedding-Level Perturbations

Seungmin Kim, Sohee Park, Donghyun Kim, Jisu Lee, Daeseon Choi

PDF

Open Access

TL;DR

RoVo is a proactive defense method that injects adversarial perturbations into embedding vectors of audio to protect against speech synthesis attacks and remains effective against speech enhancement techniques.

Contribution

It introduces embedding-level perturbations for robust voice protection, outperforming existing methods and resisting secondary speech enhancement attacks.

Findings

01

Increased Defense Success Rate (DSR) by over 70% against state-of-the-art models

02

Achieved 99.5% DSR on commercial speaker-verification API

03

Perturbations remain effective under strong speech enhancement conditions

Abstract

With the advancement of AI-based speech synthesis technologies such as Deep Voice, there is an increasing risk of voice spoofing attacks, including voice phishing and fake news, through unauthorized use of others' voices. Existing defenses that inject adversarial perturbations directly into audio signals have limited effectiveness, as these perturbations can easily be neutralized by speech enhancement methods. To overcome this limitation, we propose RoVo (Robust Voice), a novel proactive defense technique that injects adversarial perturbations into high-dimensional embedding vectors of audio signals, reconstructing them into protected speech. This approach effectively defends against speech synthesis attacks and also provides strong resistance to speech enhancement models, which represent a secondary attack threat. In extensive experiments, RoVo increased the Defense Success Rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Speech Recognition and Synthesis · Hate Speech and Cyberbullying Detection