TabularMark: Watermarking Tabular Datasets for Machine Learning
Yihao Zheng, Haocheng Xia, Junyuan Pang, Jinfei Liu, Kui Ren, Lingyang, Chu, Yang Cao, Li Xiong

TL;DR
TabularMark is a novel watermarking scheme for tabular datasets that ensures data ownership protection while maintaining data utility for machine learning tasks, using hypothesis testing and data perturbation techniques.
Contribution
The paper introduces TabularMark, a watermarking method that balances detectability, non-intrusiveness, and robustness, specifically designed for tabular data used in machine learning.
Findings
TabularMark outperforms existing methods in detectability and robustness.
It preserves data utility for ML training with minimal impact.
The scheme is effective on both real-world and synthetic datasets.
Abstract
Watermarking is broadly utilized to protect ownership of shared data while preserving data utility. However, existing watermarking methods for tabular datasets fall short on the desired properties (detectability, non-intrusiveness, and robustness) and only preserve data utility from the perspective of data statistics, ignoring the performance of downstream ML models trained on the datasets. Can we watermark tabular datasets without significantly compromising their utility for training ML models while preventing attackers from training usable ML models on attacked datasets? In this paper, we propose a hypothesis testing-based watermarking scheme, TabularMark. Data noise partitioning is utilized for data perturbation during embedding, which is adaptable for numerical and categorical attributes while preserving the data utility. For detection, a custom-threshold one proportion z-test is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Internet Traffic Analysis and Secure E-voting · Chaos-based Image/Signal Encryption
