Large image datasets: A pyrrhic win for computer vision?

Vinay Uday Prabhu; Abeba Birhane

arXiv:2006.16923·cs.CY·July 27, 2020·44 cites

Large image datasets: A pyrrhic win for computer vision?

Vinay Uday Prabhu, Abeba Birhane

PDF

Open Access 2 Repos

TL;DR

This paper critically examines large-scale vision datasets like ImageNet, revealing ethical issues such as non-consensual content and proposing measures like IRBs to improve dataset curation practices.

Contribution

It provides a detailed ethical analysis of ImageNet, including a census of problematic images, and suggests corrective actions and open-sources tools for the community.

Findings

01

Identification of verifiably pornographic images in ImageNet

02

Quantitative analysis of ethical transgressions in datasets

03

Recommendations for ethical dataset curation practices

Abstract

In this paper we investigate problematic practices and consequences of large scale vision datasets. We examine broad issues such as the question of consent and justice as well as specific concerns such as the inclusion of verifiably pornographic images in datasets. Taking the ImageNet-ILSVRC-2012 dataset as an example, we perform a cross-sectional model-based quantitative census covering factors such as age, gender, NSFW content scoring, class-wise accuracy, human-cardinality-analysis, and the semanticity of the image class information in order to statistically investigate the extent and subtleties of ethical transgressions. We then use the census to help hand-curate a look-up-table of images in the ImageNet-ILSVRC-2012 dataset that fall into the categories of verifiably pornographic: shot in a non-consensual setting (up-skirt), beach voyeuristic, and exposed private parts. We survey…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Face recognition and analysis · Sexuality, Behavior, and Technology