Speech Watermarking with Discrete Intermediate Representations
Shengpeng Ji, Ziyue Jiang, Jialong Zuo, Minghui Fang, Yifu Chen, Tao, Jin, Zhou Zhao

TL;DR
DiscreteWM is a novel speech watermarking framework that embeds watermarks into discrete intermediate representations of speech, achieving high robustness and imperceptibility, and capable of encoding up to 150 bits per second.
Contribution
It introduces a discrete latent space watermarking method using vector-quantized autoencoders and a token manipulation strategy for imperceptibility.
Findings
Achieves state-of-the-art robustness and imperceptibility
Can encode 1 to 150 bits of watermark per second
Effective for voice cloning detection and information hiding
Abstract
Speech watermarking techniques can proactively mitigate the potential harmful consequences of instant voice cloning techniques. These techniques involve the insertion of signals into speech that are imperceptible to humans but can be detected by algorithms. Previous approaches typically embed watermark messages into continuous space. However, intuitively, embedding watermark information into robust discrete latent space can significantly improve the robustness of watermarking systems. In this paper, we propose DiscreteWM, a novel speech watermarking framework that injects watermarks into the discrete intermediate representations of speech. Specifically, we map speech into discrete latent space with a vector-quantized autoencoder and inject watermarks by changing the modular arithmetic relation of discrete IDs. To ensure the imperceptibility of watermarks, we also propose a manipulator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Advanced Data Compression Techniques · Music and Audio Processing
