No Need to Sacrifice Data Quality for Quantity: Crowd-Informed Machine Annotation for Cost-Effective Understanding of Visual Data
Christopher Klugmann, Rafid Mahmood, Guruprasad Hegde, Amit Kale and, Daniel Kondermann

TL;DR
This paper introduces a machine annotation framework that ensures high-quality visual data labeling by predicting crowd responses and human uncertainty, significantly reducing costs without sacrificing reliability, especially in safety-critical applications.
Contribution
It presents a novel approach using posterior distributions over soft labels with a Dirichlet prior to automate and improve quality control in visual data annotation.
Findings
Automates a large portion of annotation tasks, saving costs by over 50%.
Accurately predicts human uncertainty, aiding in filtering difficult examples.
Posterior distributions serve as priors, reducing the need for multiple human labelers.
Abstract
Labeling visual data is expensive and time-consuming. Crowdsourcing systems promise to enable highly parallelizable annotations through the participation of monetarily or otherwise motivated workers, but even this approach has its limits. The solution: replace manual work with machine work. But how reliable are machine annotators? Sacrificing data quality for high throughput cannot be acceptable, especially in safety-critical applications such as autonomous driving. In this paper, we present a framework that enables quality checking of visual data at large scales without sacrificing the reliability of the results. We ask annotators simple questions with discrete answers, which can be highly automated using a convolutional neural network trained to predict crowd responses. Unlike the methods of previous work, which aim to directly predict soft labels to address human uncertainty, we use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Data-Driven Disease Surveillance · Image and Video Quality Assessment
