Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative   Uncertainty

Quanze Chen; Daniel S. Weld; Amy X. Zhang

arXiv:2108.01799·cs.HC·August 5, 2021

Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty

Quanze Chen, Daniel S. Weld, Amy X. Zhang

PDF

TL;DR

Goldilocks is a new crowd rating method that improves consistency and captures uncertainty by grounding scales with examples and using a two-step bounding process across diverse domains.

Contribution

It introduces a novel elicitation technique that distinguishes inherent ambiguity from annotator disagreement, enhancing the quality of scalar annotations.

Findings

01

Improves consistency in subjective rating domains.

02

Captures different sources of uncertainty with item ranges.

03

Enhances estimates of pairwise relationship distributions.

Abstract

Human ratings have become a crucial resource for training and evaluating machine learning systems. However, traditional elicitation methods for absolute and comparative rating suffer from issues with consistency and often do not distinguish between uncertainty due to disagreement between annotators and ambiguity inherent to the item being rated. In this work, we present Goldilocks, a novel crowd rating elicitation technique for collecting calibrated scalar annotations that also distinguishes inherent ambiguity from inter-annotator disagreement. We introduce two main ideas: grounding absolute rating scales with examples and using a two-step bounding process to establish a range for an item's placement. We test our designs in three domains: judging toxicity of online comments, estimating satiety of food depicted in images, and estimating age based on portraits. We show that (1) Goldilocks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.