Stereotyping and Bias in the Flickr30K Dataset

Emiel van Miltenburg

arXiv:1605.06083·cs.CL·May 20, 2016·53 cites

Stereotyping and Bias in the Flickr30K Dataset

Emiel van Miltenburg

PDF

Open Access 2 Repos

TL;DR

This paper investigates biases and stereotypes present in the Flickr30K dataset, revealing that descriptions often include unwarranted inferences influenced by stereotypes, challenging the assumption that descriptions are solely image-based.

Contribution

It identifies specific biases in Flickr30K descriptions and discusses methods to detect and address stereotype-driven content in future image captioning datasets.

Findings

01

Flickr30K descriptions contain stereotypes and biases.

02

Descriptions often include unwarranted inferences not solely based on images.

03

Proposes methods to identify and mitigate biases in datasets.

Abstract

An untested assumption behind the crowdsourced descriptions of the images in the Flickr30K dataset (Young et al., 2014) is that they "focus only on the information that can be obtained from the image alone" (Hodosh et al., 2013, p. 859). This paper presents some evidence against this assumption, and provides a list of biases and unwarranted inferences that can be found in the Flickr30K dataset. Finally, it considers methods to find examples of these, and discusses how we should deal with stereotype-driven descriptions in future applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBioinformatics and Genomic Networks · Cell Image Analysis Techniques · Radiomics and Machine Learning in Medical Imaging