Towards Accountability for Machine Learning Datasets: Practices from   Software Engineering and Infrastructure

Ben Hutchinson; Andrew Smart; Alex Hanna; Emily Denton; Christina; Greer; Oddur Kjartansson; Parker Barnes; Margaret Mitchell

arXiv:2010.13561·cs.LG·February 2, 2021·46 cites

Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure

Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina, Greer, Oddur Kjartansson, Parker Barnes, Margaret Mitchell

PDF

Open Access

TL;DR

This paper proposes a comprehensive framework for increasing transparency and accountability in machine learning dataset development by applying best practices from software engineering to document and communicate each stage of the process.

Contribution

It introduces a structured, lifecycle-based framework for dataset development transparency, addressing an accountability gap in AI systems.

Findings

01

Framework supports decision-making and accountability

02

Documents facilitate communication among stakeholders

03

Highlights the importance of careful data work

Abstract

Rising concern for the societal implications of artificial intelligence systems has inspired demands for greater transparency and accountability. However the datasets which empower machine learning are often used, shared and re-used with little visibility into the processes of deliberation which led to their creation. Which stakeholder groups had their perspectives included when the dataset was conceived? Which domain experts were consulted regarding how to model subgroups and other phenomena? How were questions of representational biases measured and addressed? Who labeled the data? In this paper, we introduce a rigorous framework for dataset development transparency which supports decision-making and accountability. The framework uses the cyclical, infrastructural and engineering nature of dataset development to draw on best practices from the software development lifecycle. Each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Scientific Computing and Data Management · Explainable Artificial Intelligence (XAI)