Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure
Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina, Greer, Oddur Kjartansson, Parker Barnes, Margaret Mitchell

TL;DR
This paper proposes a comprehensive framework for increasing transparency and accountability in machine learning dataset development by applying best practices from software engineering to document and communicate each stage of the process.
Contribution
It introduces a structured, lifecycle-based framework for dataset development transparency, addressing an accountability gap in AI systems.
Findings
Framework supports decision-making and accountability
Documents facilitate communication among stakeholders
Highlights the importance of careful data work
Abstract
Rising concern for the societal implications of artificial intelligence systems has inspired demands for greater transparency and accountability. However the datasets which empower machine learning are often used, shared and re-used with little visibility into the processes of deliberation which led to their creation. Which stakeholder groups had their perspectives included when the dataset was conceived? Which domain experts were consulted regarding how to model subgroups and other phenomena? How were questions of representational biases measured and addressed? Who labeled the data? In this paper, we introduce a rigorous framework for dataset development transparency which supports decision-making and accountability. The framework uses the cyclical, infrastructural and engineering nature of dataset development to draw on best practices from the software development lifecycle. Each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Scientific Computing and Data Management · Explainable Artificial Intelligence (XAI)
