Traceable TTS: Toward Watermark-Free TTS with Strong Traceability
Yuxiang Zhao, Yunchong Xiao, Yushen Chen, Zhikang Niu, Shuai Wang, Kai Yu, Xie Chen

TL;DR
This paper introduces a novel watermark-free TTS framework that enhances traceability and speech quality without relying on explicit watermarks, addressing security concerns in synthetic speech.
Contribution
It proposes a joint training method for TTS and discriminator that achieves strong traceability without degrading speech quality, pioneering watermark-free traceable TTS.
Findings
Improved traceability generalization in TTS models
Preserved or enhanced speech quality
First watermark-free approach with strong traceability
Abstract
Recent advances in Text-To-Speech (TTS) technology have enabled synthetic speech to mimic human voices with remarkable realism, raising significant security concerns. This underscores the need for traceable TTS models-systems capable of tracing their synthesized speech without compromising quality or security. However, existing methods predominantly rely on explicit watermarking on speech or on vocoder, which degrades speech quality and is vulnerable to spoofing. To address these limitations, we propose a novel framework for model attribution. Instead of embedding watermarks, we train the TTS model and discriminator using a joint training method that significantly improves traceability generalization while preserving-and even slightly improving-audio quality. This is the first work toward watermark-free TTS with strong traceability. To promote progress in related fields, we will release…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
