Robust Spectral Watermark for Synthetic Tabular Data
Yizhou Zhao, Xiang Li, Peter Song, Qi Long, Weijie Su

TL;DR
This paper introduces TAB-DRW, a novel, efficient watermarking method for synthetic tabular data that embeds signals in the frequency domain, ensuring robustness and high data fidelity across various data types.
Contribution
The paper presents TAB-DRW, a new robust, computationally efficient watermarking scheme for synthetic tabular data that handles mixed data types and resists post-processing attacks.
Findings
Achieves strong detectability and robustness against attacks.
Preserves high data fidelity and supports mixed feature types.
Outperforms existing watermarking methods on benchmark datasets.
Abstract
The rise of generative AI has enabled the production of high-fidelity synthetic tabular data across fields such as healthcare, finance, and public policy, raising growing concerns about data provenance and misuse. Watermarking offers a promising solution to address these concerns by ensuring the traceability of synthetic data, but existing methods face many limitations: they are computationally expensive due to reliance on the inverse process of large diffusion models, struggle with mixed discrete-continuous data, or lack robustness to common post-processing attacks. To address these limitations, we propose TAB-DRW, an efficient and robust post-editing watermarking scheme for synthetic tabular data. TAB-DRW embeds watermark signals in the frequency domain: it normalizes heterogeneous features via the Yeo-Johnson transformation and standardization, applies the discrete Fourier transform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
