TL;DR
This paper evaluates the radioactive data method for dataset ownership verification in ML, revealing its strengths and limitations in different settings and its robustness against model extraction attacks.
Contribution
The study critically assesses radioactive data for dataset watermarking, demonstrating its effectiveness in black-box verification and robustness against model extraction, while highlighting limitations in low-class or low-sample datasets.
Findings
Black-box verification is effective across datasets.
White-box verification effectiveness depends on dataset size.
Radioactive data survives model extraction attacks.
Abstract
In a data-driven world, datasets constitute a significant economic value. Dataset owners who spend time and money to collect and curate the data are incentivized to ensure that their datasets are not used in ways that they did not authorize. When such misuse occurs, dataset owners need technical mechanisms for demonstrating their ownership of the dataset in question. Dataset watermarking provides one approach for ownership demonstration which can, in turn, deter unauthorized use. In this paper, we investigate a recently proposed data provenance method, radioactive data, to assess if it can be used to demonstrate ownership of (image) datasets used to train machine learning (ML) models. The original paper reported that radioactive data is effective in white-box settings. We show that while this is true for large datasets with many classes, it is not as effective for datasets where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
