Generalizable Error Modeling for Human Data Annotation: Evidence From an Industry-Scale Search Data Annotation Program
Heinrich Peters, Alireza Hashemi, James Rae

TL;DR
This study develops a generalizable error prediction model for human data annotation in search relevance tasks, demonstrating its effectiveness across multiple applications and improving annotation quality and efficiency.
Contribution
The paper introduces a task-agnostic error prediction model trained on behavioral and task features, outperforming prior label-based approaches and applicable across diverse industry-scale annotation tasks.
Findings
Error prediction model achieves AUC of 0.65-0.75.
Model generalizes well across different applications.
Using the model increases annotation correction efficiency by up to 40%.
Abstract
Machine learning (ML) and artificial intelligence (AI) systems rely heavily on human-annotated data for training and evaluation. A major challenge in this context is the occurrence of annotation errors, as their effects can degrade model performance. This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks for three industry-scale ML applications (music streaming, video streaming, and mobile apps). Drawing on real-world data from an extensive search relevance annotation program, we demonstrate that errors can be predicted with moderate model performance (AUC=0.65-0.75) and that model performance generalizes well across applications (i.e., a global, task-agnostic model performs on par with task-specific models). In contrast to past research, which has often focused on predicting annotation labels from task-specific features, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Data Stream Mining Techniques · Machine Learning and Data Classification
