A Standardized Machine-readable Dataset Documentation Format for   Responsible AI

Nitisha Jain; Mubashara Akhtar; Joan Giner-Miguelez; Rajat Shinde,; Joaquin Vanschoren; Steffen Vogler; Sujata Goswami; Yuhan Rao; Tim Santos,; Luis Oala; Michalis Karamousadakis; Manil Maskey; Pierre Marcenac; Costanza; Conforti; Michael Kuchnik; Lora Aroyo; Omar Benjelloun; Elena Simperl

arXiv:2407.16883·cs.IR·July 25, 2024

A Standardized Machine-readable Dataset Documentation Format for Responsible AI

Nitisha Jain, Mubashara Akhtar, Joan Giner-Miguelez, Rajat Shinde,, Joaquin Vanschoren, Steffen Vogler, Sujata Goswami, Yuhan Rao, Tim Santos,, Luis Oala, Michalis Karamousadakis, Manil Maskey, Pierre Marcenac, Costanza, Conforti, Michael Kuchnik, Lora Aroyo, Omar Benjelloun

PDF

TL;DR

This paper introduces Croissant-RAI, a standardized, machine-readable dataset documentation format to improve discoverability, interoperability, and trustworthiness of AI datasets, supporting responsible AI practices.

Contribution

It extends existing metadata standards to create a community-driven, adaptable format integrated with web practices and tools for better dataset documentation in AI.

Findings

01

Croissant-RAI is adopted by major data platforms.

02

It improves dataset discoverability and trustworthiness.

03

Supports community-driven, evolving documentation standards.

Abstract

Data is critical to advancing AI technologies, yet its quality and documentation remain significant challenges, leading to adverse downstream effects (e.g., potential biases) in AI applications. This paper addresses these issues by introducing Croissant-RAI, a machine-readable metadata format designed to enhance the discoverability, interoperability, and trustworthiness of AI datasets. Croissant-RAI extends the Croissant metadata format and builds upon existing responsible AI (RAI) documentation frameworks, offering a standardized set of attributes and practices to facilitate community-wide adoption. Leveraging established web-publishing practices, such as Schema.org, Croissant-RAI enables dataset users to easily find and utilize RAI metadata regardless of the platform on which the datasets are published. Furthermore, it is seamlessly integrated into major data search engines,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.