IPProtect: protecting the intellectual property of visual datasets during data valuation
Gursimran Singh, Chendi Wang, Ahnaf Tazwar, Lanjun Wang, Yong Zhang

TL;DR
This paper introduces IPProtect, a method to safeguard intellectual property in visual datasets during data valuation, balancing privacy and utility for machine learning tasks.
Contribution
It formalizes visual dataset IP risks and proposes a novel sanitization algorithm that protects IP while enabling accurate data valuation.
Findings
Effective dataset sanitization resisting IP violations
Maintains data utility for machine learning tasks
Outperforms baseline methods in experiments
Abstract
Data trading is essential to accelerate the development of data-driven machine learning pipelines. The central problem in data trading is to estimate the utility of a seller's dataset with respect to a given buyer's machine learning task, also known as data valuation. Typically, data valuation requires one or more participants to share their raw dataset with others, leading to potential risks of intellectual property (IP) violations. In this paper, we tackle the novel task of preemptively protecting the IP of datasets that need to be shared during data valuation. First, we identify and formalize two kinds of novel IP risks in visual datasets: data-item (image) IP and statistical (dataset) IP. Then, we propose a novel algorithm to convert the raw dataset into a sanitized version, that provides resistance to IP violations, while at the same time allowing accurate data valuation. The key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Blockchain Technology Applications and Security · Retinal Imaging and Analysis
