Robust Distortion-Free Watermark for Autoregressive Audio Generation Models

Yihan Wu; Georgios Milis; Ruibo Chen; Heng Huang

arXiv:2510.21115·cs.SD·October 27, 2025

Robust Distortion-Free Watermark for Autoregressive Audio Generation Models

Yihan Wu, Georgios Milis, Ruibo Chen, Heng Huang

PDF

TL;DR

This paper introduces Aligned-IS, a new distortion-free watermarking method for autoregressive audio models that enhances security and detectability without compromising audio quality.

Contribution

The paper presents Aligned-IS, a clustering-based watermarking technique specifically designed to overcome retokenization mismatch in autoregressive audio models, improving security measures.

Findings

01

Aligned-IS outperforms existing methods in detectability.

02

It maintains high audio quality post-watermarking.

03

It sets a new benchmark for secure audio generation.

Abstract

The rapid advancement of next-token-prediction models has led to widespread adoption across modalities, enabling the creation of realistic synthetic media. In the audio domain, while autoregressive speech models have propelled conversational interactions forward, the potential for misuse, such as impersonation in phishing schemes or crafting misleading speech recordings, has also increased. Security measures such as watermarking have thus become essential to ensuring the authenticity of digital media. Traditional statistical watermarking methods used for autoregressive language models face challenges when applied to autoregressive audio models, due to the inevitable ``retokenization mismatch'' - the discrepancy between original and retokenized discrete audio token sequences. To address this, we introduce Aligned-IS, a novel, distortion-free watermark, specifically crafted for audio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.