Watermarking Generative Categorical Data
Bochao Gu, Hengzhi He, Guang Cheng

TL;DR
This paper introduces a new statistical framework for watermarking generative categorical data at the distribution level, enabling verification through distributional analysis, especially useful for synthetic data generation.
Contribution
The paper presents a novel distribution-level watermarking method for generative categorical data, with a verification process based on total variation distance, differing from prior point-based approaches.
Findings
Effective watermark embedding at the distribution level.
Verification via total variation distance is reliable.
The method is validated both theoretically and empirically.
Abstract
In this paper, we propose a novel statistical framework for watermarking generative categorical data. Our method systematically embeds pre-agreed secret signals by splitting the data distribution into two components and modifying one distribution based on a deterministic relationship with the other, ensuring the watermark is embedded at the distribution-level. To verify the watermark, we introduce an insertion inverse algorithm and detect its presence by measuring the total variation distance between the inverse-decoded data and the original distribution. Unlike previous categorical watermarking methods, which primarily focus on embedding watermarks into a given dataset, our approach operates at the distribution-level, allowing for verification from a statistical distributional perspective. This makes it particularly well-suited for the modern paradigm of synthetic data generation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Chaos-based Image/Signal Encryption · Cellular Automata and Applications
MethodsFocus
