TL;DR
This paper examines variable effort crowdsourcing tasks where effort varies per item and shows that visible gold questions with feedback significantly improve annotation quality, achieving a 7% accuracy boost.
Contribution
The study demonstrates that visible gold questions with periodic feedback effectively mitigate quality drops in variable effort crowdsourcing tasks, outperforming other best practices.
Findings
Visible gold questions improve annotation accuracy by 7%.
Annotator accuracy drops as effort increases without feedback.
Periodic feedback helps maintain high quality in variable effort tasks.
Abstract
We consider a class of variable effort human annotation tasks in which the number of labels required per item can greatly vary (e.g., finding all faces in an image, named entities in a text, bird calls in an audio recording, etc.). In such tasks, some items require far more effort than others to annotate. Furthermore, the per-item annotation effort is not known until after each item is annotated since determining the number of labels required is an implicit part of the annotation task itself. On an image bounding-box task with crowdsourced annotators, we show that annotator accuracy and recall consistently drop as effort increases. We hypothesize reasons for this drop and investigate a set of approaches to counteract it. Firstly, we benchmark on this task a set of general best-practice methods for quality crowdsourcing. Notably, only one of these methods actually improves quality: the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
