
TL;DR
This paper emphasizes the importance of legal compliance in building datasets for data-centric AI, reviewing legal obligations, analyzing impacts on ML pipelines, and proposing a framework for creating lawful datasets.
Contribution
It provides a comprehensive review of legal obligations and introduces a practical framework for constructing legally compliant ML datasets.
Findings
Legal obligations significantly influence dataset construction
Data laws impact ML pipeline design and data sharing
A framework aids in building legally compliant datasets
Abstract
Data-centric AI calls for better, not just bigger, datasets. As data protection laws with extra-territorial reach proliferate worldwide, ensuring datasets are legal is an increasingly crucial yet overlooked component of ``better''. To help dataset builders become more willing and able to navigate this complex legal space, this paper reviews key legal obligations surrounding ML datasets, examines the practical impact of data laws on ML pipelines, and offers a framework for building legal datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Digital and Cyber Forensics
