The State of Documentation Practices of Third-party Machine Learning   Models and Datasets

Ernesto Lang Oreamuno; Rohan Faiyaz Khan; Abdul Ali Bangash; Catherine; Stinson; Bram Adams

arXiv:2312.15058·cs.SE·June 19, 2024·1 cites

The State of Documentation Practices of Third-party Machine Learning Models and Datasets

Ernesto Lang Oreamuno, Rohan Faiyaz Khan, Abdul Ali Bangash, Catherine, Stinson, Bram Adams

PDF

Open Access

TL;DR

This paper evaluates the current state of documentation practices for third-party ML models and datasets on Hugging Face, revealing low documentation rates and inconsistencies in ethics and transparency disclosures.

Contribution

It provides a large-scale analysis of documentation practices for models and datasets, highlighting gaps and inconsistencies in current standards.

Findings

01

Only 39.62% of models have documentation.

02

Only 28.48% of datasets have documentation.

03

Inconsistencies exist in ethics and transparency documentation.

Abstract

Model stores offer third-party ML models and datasets for easy project integration, minimizing coding efforts. One might hope to find detailed specifications of these models and datasets in the documentation, leveraging documentation standards such as model and dataset cards. In this study, we use statistical analysis and hybrid card sorting to assess the state of the practice of documenting model cards and dataset cards in one of the largest model stores in use today--Hugging Face (HF). Our findings show that only 21,902 models (39.62\%) and 1,925 datasets (28.48\%) have documentation. Furthermore, we observe inconsistency in ethics and transparency-related documentation for ML models and datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Machine Learning and Data Classification