VocBulwark: Towards Practical Generative Speech Watermarking via Additional-Parameter Injection

Weizhi Liu; Yue Li; Zhaoxia Yin

arXiv:2601.22556·cs.CR·February 2, 2026

VocBulwark: Towards Practical Generative Speech Watermarking via Additional-Parameter Injection

Weizhi Liu, Yue Li, Zhaoxia Yin

PDF

Open Access

TL;DR

VocBulwark introduces a novel parameter injection method for speech watermarking that maintains high audio quality and robustness against attacks, addressing limitations of previous techniques.

Contribution

The paper presents VocBulwark, a new framework that embeds watermarks via additional parameters, combining a Temporal Adapter and Gated Extractor with an optimization curriculum for improved robustness.

Findings

01

Achieves high-capacity, high-fidelity watermarking.

02

Resilient against Codec regenerations and variable-length manipulations.

03

Maintains perceptual quality while resisting advanced attacks.

Abstract

Generated speech achieves human-level naturalness but escalates security risks of misuse. However, existing watermarking methods fail to reconcile fidelity with robustness, as they rely either on simple superposition in the noise space or on intrusive alterations to model weights. To bridge this gap, we propose VocBulwark, an additional-parameter injection framework that freezes generative model parameters to preserve perceptual quality. Specifically, we design a Temporal Adapter to deeply entangle watermarks with acoustic attributes, synergizing with a Coarse-to-Fine Gated Extractor to resist advanced attacks. Furthermore, we develop an Accuracy-Guided Optimization Curriculum that dynamically orchestrates gradient flow to resolve the optimization conflict between fidelity and robustness. Comprehensive experiments demonstrate that VocBulwark achieves high-capacity and high-fidelity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Speech Recognition and Synthesis