Seeing the Unseen: Errors and Bias in Visual Datasets
Hongrui Jin

TL;DR
This paper investigates errors and biases in visual datasets, highlighting how dataset flaws can lead to significant issues in machine vision applications like face recognition and autonomous driving.
Contribution
It identifies key sources of dataset errors, such as limited categories and poor classification, and analyzes their impact on algorithmic biases and failures.
Findings
Datasets often contain limited categories leading to misclassification.
Poor sourcing and classification contribute to biases in visual datasets.
Errors in datasets can cause serious real-world issues like misrecognition of ethnicities.
Abstract
From face recognition in smartphones to automatic routing on self-driving cars, machine vision algorithms lie in the core of these features. These systems solve image based tasks by identifying and understanding objects, subsequently making decisions from these information. However, errors in datasets are usually induced or even magnified in algorithms, at times resulting in issues such as recognising black people as gorillas and misrepresenting ethnicities in search results. This paper tracks the errors in datasets and their impacts, revealing that a flawed dataset could be a result of limited categories, incomprehensive sourcing and poor classification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods
