Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy
Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, Olga Russakovsky

TL;DR
This paper investigates biases in the 'people' subtree of ImageNet, identifying issues like vocabulary stagnation, exhaustive category coverage, and representation inequality, and proposes initial mitigation strategies.
Contribution
It analyzes key factors causing bias in ImageNet's 'people' categories and offers first steps towards more balanced and fair datasets.
Findings
Identified vocabulary stagnation in WordNet's 'people' categories
Highlighted representation inequality in images across categories
Proposed initial mitigation strategies for dataset bias
Abstract
Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including offensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in these datasets are critical to the models' behavior. In this paper, we examine ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods. We consider three key factors within the "person" subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology: (1) the stagnant concept vocabulary of WordNet, (2) the attempt at exhaustive illustration of all categories with images, and (3) the inequality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
