Compliance Rating Scheme: A Data Provenance Framework for Generative AI Datasets
Matyas Bohacek, Ignacio Vilanova Echavarri

TL;DR
This paper introduces the Compliance Rating Scheme (CRS), a framework and open-source tool for assessing the transparency, accountability, and security of datasets used in Generative AI, addressing ethical and legal concerns.
Contribution
The paper presents a novel framework and an open-source library for evaluating and improving data provenance and compliance in GAI datasets.
Findings
Provides a practical framework for dataset compliance evaluation
Offers an open-source library for seamless integration into AI workflows
Enhances transparency and accountability in dataset creation and sharing
Abstract
Generative Artificial Intelligence (GAI) has experienced exponential growth in recent years, partly facilitated by the abundance of large-scale open-source datasets. These datasets are often built using unrestricted and opaque data collection practices. While most literature focuses on the development and applications of GAI models, the ethical and legal considerations surrounding the creation of these datasets are often neglected. In addition, as datasets are shared, edited, and further reproduced online, information about their origin, legitimacy, and safety often gets lost. To address this gap, we introduce the Compliance Rating Scheme (CRS), a framework designed to evaluate dataset compliance with critical transparency, accountability, and security principles. We also release an open-source Python library built around data provenance technology to implement this framework, allowing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
