Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model Cards
Angelina McMillan-Major, Salomey Osei, Juan Diego Rodriguez, Pawan, Sasanka Ammanamanchi, Sebastian Gehrmann, Yacine Jernite

TL;DR
This paper presents reusable documentation templates for NLP datasets and models, exemplified through case studies of HuggingFace data cards and GEM benchmark cards, aiming to standardize and improve documentation practices.
Contribution
It introduces a systematic process for developing standardized, reusable documentation templates for NLP datasets and models, based on case studies and stakeholder feedback.
Findings
Enhanced clarity and consistency in dataset and model documentation.
Facilitated adoption of standard documentation practices in NLP.
Provided a replicable process for creating documentation templates.
Abstract
Developing documentation guidelines and easy-to-use templates for datasets and models is a challenging task, especially given the variety of backgrounds, skills, and incentives of the people involved in the building of natural language processing (NLP) tools. Nevertheless, the adoption of standard documentation practices across the field of NLP promotes more accessible and detailed descriptions of NLP datasets and models, while supporting researchers and developers in reflecting on their work. To help with the standardization of documentation, we present two case studies of efforts that aim to develop reusable documentation templates -- the HuggingFace data card, a general purpose card for datasets in NLP, and the GEM benchmark data and model cards with a focus on natural language generation. We describe our process for developing these templates, including the identification of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
