Watermarking Generative Categorical Data

Bochao Gu; Hengzhi He; Guang Cheng

arXiv:2411.10898·cs.CR·November 19, 2024

Watermarking Generative Categorical Data

Bochao Gu, Hengzhi He, Guang Cheng

PDF

Open Access

TL;DR

This paper introduces a new statistical framework for watermarking generative categorical data at the distribution level, enabling verification through distributional analysis, especially useful for synthetic data generation.

Contribution

The paper presents a novel distribution-level watermarking method for generative categorical data, with a verification process based on total variation distance, differing from prior point-based approaches.

Findings

01

Effective watermark embedding at the distribution level.

02

Verification via total variation distance is reliable.

03

The method is validated both theoretically and empirically.

Abstract

In this paper, we propose a novel statistical framework for watermarking generative categorical data. Our method systematically embeds pre-agreed secret signals by splitting the data distribution into two components and modifying one distribution based on a deterministic relationship with the other, ensuring the watermark is embedded at the distribution-level. To verify the watermark, we introduce an insertion inverse algorithm and detect its presence by measuring the total variation distance between the inverse-decoded data and the original distribution. Unlike previous categorical watermarking methods, which primarily focus on embedding watermarks into a given dataset, our approach operates at the distribution-level, allowing for verification from a statistical distributional perspective. This makes it particularly well-suited for the modern paradigm of synthetic data generation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Chaos-based Image/Signal Encryption · Cellular Automata and Applications

MethodsFocus