Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment

Gunnar P. Epping; Andrew Caplin; Erik Duhaime; William R. Holmes; Daniel Martin; Jennifer S. Trueblood

arXiv:2603.11511·cs.HC·March 13, 2026

Managing Cognitive Bias in Human Labeling Operations for Rare-Event AI: Evidence from a Field Experiment

Gunnar P. Epping, Andrew Caplin, Erik Duhaime, William R. Holmes, Daniel Martin, Jennifer S. Trueblood

PDF

Open Access

TL;DR

This study investigates cognitive biases in human labeling of rare events and demonstrates that balanced feedback, probabilistic elicitation, and recalibration techniques improve label quality and AI model performance in a medical crowdsourcing context.

Contribution

The paper provides empirical evidence on bias mitigation strategies in human annotation for rare-event detection and introduces a pipeline-level recalibration method that enhances AI reliability.

Findings

01

Balanced feedback reduces rare-event misses

02

Probabilistic elicitation improves label calibration

03

Recalibration enhances downstream CNN performance

Abstract

Many operational AI systems depend on large-scale human annotation to detect rare but consequential events (e.g., fraud, defects, and medical abnormalities). When positives are rare, the prevalence effect induces systematic cognitive biases that inflate misses and can propagate through the AI lifecycle via biased training labels. We analyze prior experimental evidence and run a field experiment on DiagnosUs, a medical crowdsourcing platform, in which we hold the true prevalence in the unlabeled stream fixed (20% blasts) while varying (i) the prevalence of positives in the gold-standard feedback stream (20% vs. 50%) and (ii) the response interface (binary labels vs. elicited probabilities). We then post-process probabilistic labels using a linear-in-log-odds recalibration approach at the worker and crowd levels, and train convolutional neural networks on the resulting labels. Balanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)