OOWL500: Overcoming Dataset Collection Bias in the Wild

Brandon Leung; Chih-Hui Ho; Amir Persekian; David Orozco; Yen Chang,; Erik Sandstrom; Bo Liu; Nuno Vasconcelos

arXiv:2108.10992·cs.CV·August 26, 2021

OOWL500: Overcoming Dataset Collection Bias in the Wild

Brandon Leung, Chih-Hui Ho, Amir Persekian, David Orozco, Yen Chang,, Erik Sandstrom, Bo Liu, Nuno Vasconcelos

PDF

Open Access

TL;DR

This paper introduces OOWL500, a large, unbiased 'in the lab' image dataset collected via drone, which helps identify and reduce biases in object recognition models and improves their robustness.

Contribution

The paper presents a scalable drone-based data collection method, creating the OOWL500 dataset, and demonstrates its effectiveness in reducing biases and enhancing object recognition.

Findings

01

OOWL500 contains 120,000 images of 500 objects, making it the largest of its kind.

02

Augmenting wild datasets with in-lab data reduces biases and improves generalization.

03

Camera shake and pose diversity are crucial for robust object recognition.

Abstract

The hypothesis that image datasets gathered online "in the wild" can produce biased object recognizers, e.g. preferring professional photography or certain viewing angles, is studied. A new "in the lab" data collection infrastructure is proposed consisting of a drone which captures images as it circles around objects. Crucially, the control provided by this setup and the natural camera shake inherent to flight mitigate many biases. It's inexpensive and easily replicable nature may also potentially lead to a scalable data collection effort by the vision community. The procedure's usefulness is demonstrated by creating a dataset of Objects Obtained With fLight (OOWL). Denoted as OOWL500, it contains 120,000 images of 500 objects and is the largest "in the lab" image dataset available when both number of classes and objects per class are considered. Furthermore, it has enabled several of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning