Datasheets for Datasets
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman, Vaughan, Hanna Wallach, Hal Daum\'e III, and Kate Crawford

TL;DR
This paper introduces a standardized documentation format called datasheets for datasets, aiming to improve transparency, accountability, and communication in machine learning dataset creation and usage.
Contribution
It proposes a structured datasheet template for datasets, inspired by electronics datasheets, to document motivation, composition, collection, and recommended uses.
Findings
Enhances transparency and accountability in dataset documentation.
Facilitates better communication between dataset creators and users.
Encourages adoption of standardized dataset documentation practices.
Abstract
The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on. Datasheets for datasets will facilitate better communication between dataset creators and dataset consumers, and encourage the machine learning community to prioritize transparency and accountability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗BSC-LT/salamandra-7b-instructmodel· 81k dl· ♡ 7781k dl♡ 77
- 🤗BSC-LT/salamandra-7bmodel· 355 dl· ♡ 29355 dl♡ 29
- 🤗BSC-LT/salamandra-2bmodel· 1.3k dl· ♡ 251.3k dl♡ 25
- 🤗BSC-LT/salamandra-2b-instructmodel· 6.3k dl· ♡ 276.3k dl♡ 27
- 🤗robbiemu/salamandra-2b-instructmodel· 92 dl92 dl
- 🤗RichardErkhov/BSC-LT_-_salamandra-7b-instruct-ggufmodel· 141 dl141 dl
- 🤗RichardErkhov/BSC-LT_-_salamandra-7b-ggufmodel· 73 dl73 dl
- 🤗robbiemu/salamandra-2bmodel· 111 dl111 dl
- 🤗RichardErkhov/BSC-LT_-_salamandra-2b-instruct-ggufmodel· 356 dl356 dl
- 🤗RichardErkhov/BSC-LT_-_salamandra-2b-ggufmodel· 78 dl78 dl
Videos
Machine Learning Experts - Margaret Mitchell· youtube
