Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels
Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, Ross, Girshick

TL;DR
This paper addresses the challenge of learning accurate image classifiers from noisy, human-centric annotations by modeling and decoupling reporting bias, leading to significant improvements in classification and captioning tasks.
Contribution
The authors introduce a novel algorithm that models human reporting bias, enabling the extraction of visually correct labels from noisy annotations, which enhances classifier performance.
Findings
Improved accuracy in image classification and captioning tasks.
Effective modeling of human reporting bias in noisy annotations.
Doubling performance of existing methods in some cases.
Abstract
When human annotators are given a choice about what to label in an image, they apply their own subjective judgments on what to ignore and what to mention. We refer to these noisy "human-centric" annotations as exhibiting human reporting bias. Examples of such annotations include image tags and keywords found on photo sharing sites, or in datasets containing image captions. In this paper, we use these noisy annotations for learning visually correct image classifiers. Such annotations do not use consistent vocabulary, and miss a significant amount of the information present in an image; however, we demonstrate that the noise in these annotations exhibits structure and can be modeled. We propose an algorithm to decouple the human reporting bias from the correct visually grounded labels. Our results are highly interpretable for reporting "what's in the image" versus "what's worth saying."…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
