Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development
Morgan Klaus Scheuerman, Emily Denton, Alex Hanna

TL;DR
This paper investigates how computer vision datasets reflect disciplinary values and biases, revealing trade-offs like efficiency versus care and universality versus contextuality, and suggests ways to incorporate diverse values into dataset creation.
Contribution
It provides a systematic analysis of dataset documentation in computer vision, highlighting underlying values and proposing improvements for more ethical data practices.
Findings
Datasets often prioritize efficiency over care.
Universal datasets may overlook contextual differences.
Silenced values include social and positional considerations.
Abstract
Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real-world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision's propensity to shape machine learning research and impact human life, we seek to understand disciplinary practices around dataset documentation - how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examine what dataset documentation communicates about the underlying values of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
