Erasing Your Voice Before It's Heard: Training-free Speaker Unlearning for Zero-shot Text-to-Speech

Myungjin Lee; Eunji Shin; Jiyoung Lee

arXiv:2601.20481·eess.AS·January 29, 2026

Erasing Your Voice Before It's Heard: Training-free Speaker Unlearning for Zero-shot Text-to-Speech

Myungjin Lee, Eunji Shin, Jiyoung Lee

PDF

Open Access

TL;DR

TruS is a training-free framework for speaker unlearning in zero-shot TTS, enabling suppression of specific voices during inference without retraining, thus enhancing privacy and safety.

Contribution

It introduces a novel inference-time control method for speaker unlearning that works without retraining, applicable to unseen speakers.

Findings

01

Effectively prevents voice synthesis of targeted speakers.

02

Works on both seen and unseen speakers.

03

Maintains other speech attributes like prosody and emotion.

Abstract

Modern zero-shot text-to-speech (TTS) models offer unprecedented expressivity but also pose serious crime risks, as they can synthesize voices of individuals who never consented. In this context, speaker unlearning aims to prevent the generation of specific speaker identities upon request. Existing approaches, reliant on retraining, are costly and limited to speakers seen in the training set. We present TruS, a training-free speaker unlearning framework that shifts the paradigm from data deletion to inference-time control. TruS steers identity-specific hidden activations to suppress target speakers while preserving other attributes (e.g., prosody and emotion). Experimental results show that TruS effectively prevents voice generation on both seen and unseen opt-out speakers, establishing a scalable safeguard for speech synthesis. The demo and code are available on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Hate Speech and Cyberbullying Detection · Mental Health via Writing