How many labelers do you have? A closer look at gold-standard labels
Chen Cheng, Hilal Asi, John Duchi

TL;DR
This paper challenges the standard practice of creating gold-standard labels by aggregating multiple annotations, showing that using all label information can improve model calibration and learning efficiency under certain conditions.
Contribution
It develops a theoretical model analyzing the statistical effects of label aggregation versus full label utilization, providing insights into when non-aggregated labels enhance learning.
Findings
Non-aggregated labels can lead to better calibration and faster convergence.
Aggregated labels offer robustness but slower learning rates.
Predictions validated on real datasets support the theory.
Abstract
The construction of most supervised learning datasets revolves around collecting multiple labels for each instance, then aggregating the labels to form a type of "gold-standard". We question the wisdom of this pipeline by developing a (stylized) theoretical model of this process and analyzing its statistical consequences, showing how access to non-aggregated label information can make training well-calibrated models more feasible than it is with gold-standard labels. The entire story, however, is subtle, and the contrasts between aggregated and fuller label information depend on the particulars of the problem, where estimators that use aggregated information exhibit robust but slower rates of convergence, while estimators that can effectively leverage all labels converge more quickly if they have fidelity to (or can learn) the true labeling process. The theory makes several predictions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Music and Audio Processing · Machine Learning and Algorithms
MethodsTest
