Seeing the Unseen: Errors and Bias in Visual Datasets

Hongrui Jin

arXiv:2211.01847·cs.CV·November 4, 2022

Seeing the Unseen: Errors and Bias in Visual Datasets

Hongrui Jin

PDF

Open Access

TL;DR

This paper investigates errors and biases in visual datasets, highlighting how dataset flaws can lead to significant issues in machine vision applications like face recognition and autonomous driving.

Contribution

It identifies key sources of dataset errors, such as limited categories and poor classification, and analyzes their impact on algorithmic biases and failures.

Findings

01

Datasets often contain limited categories leading to misclassification.

02

Poor sourcing and classification contribute to biases in visual datasets.

03

Errors in datasets can cause serious real-world issues like misrecognition of ethnicities.

Abstract

From face recognition in smartphones to automatic routing on self-driving cars, machine vision algorithms lie in the core of these features. These systems solve image based tasks by identifying and understanding objects, subsequently making decisions from these information. However, errors in datasets are usually induced or even magnified in algorithms, at times resulting in issues such as recognising black people as gorillas and misrepresenting ethnicities in search results. This paper tracks the errors in datasets and their impacts, revealing that a flawed dataset could be a result of limited categories, incomprehensive sourcing and poor classification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods