A Critical Field Guide for Working with Machine Learning Datasets
Sarah Ciston, Mike Ananny, Kate Crawford

TL;DR
This paper provides practical guidance and critical perspectives for responsibly managing machine learning datasets throughout their lifecycle, emphasizing ethical, legal, and technical considerations.
Contribution
It introduces a comprehensive, accessible framework combining critical AI theories with applied data science strategies for conscientious dataset stewardship.
Findings
Offers questions and strategies for dataset evaluation
Provides resources for ethical and legal considerations
Enhances understanding of dataset lifecycle management
Abstract
Machine learning datasets are powerful but unwieldy. Despite the fact that large datasets commonly contain problematic material--whether from a technical, legal, or ethical perspective--datasets are valuable resources when handled carefully and critically. A Critical Field Guide for Working with Machine Learning Datasets suggests practical guidance for conscientious dataset stewardship. It offers questions, suggestions, strategies, and resources for working with existing machine learning datasets at every phase of their lifecycle. It combines critical AI theories and applied data science concepts, explained in accessible language. Equipped with this understanding, students, journalists, artists, researchers, and developers can be more capable of avoiding the problems unique to datasets. They can also construct more reliable, robust solutions, or even explore new ways of thinking with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data Technologies and Applications
