Toward Effective Automated Content Analysis via Crowdsourcing
Jiele Wu, Chau-Wai Wong, Xinyan Zhao, Xianpeng Liu

TL;DR
This paper introduces a quality-aware crowdsourcing system for semantic content analysis that maintains high annotation quality over time through real-time feedback, validated by expert data and machine learning tasks.
Contribution
It proposes a novel feedback mechanism to sustain worker quality in subjective annotation tasks, improving large-scale semantic data collection.
Findings
Effective in maintaining annotation quality over extended periods
Achieves 70%-80% accuracy in machine learning tasks
Validated with expert-labeled datasets
Abstract
Many computer scientists use the aggregated answers of online workers to represent ground truth. Prior work has shown that aggregation methods such as majority voting are effective for measuring relatively objective features. For subjective features such as semantic connotation, online workers, known for optimizing their hourly earnings, tend to deteriorate in the quality of their responses as they work longer. In this paper, we aim to address this issue by proposing a quality-aware semantic data annotation system. We observe that with timely feedback on workers' performance quantified by quality scores, better informed online workers can maintain the quality of their labeling throughout an extended period of time. We validate the effectiveness of the proposed annotation system through i) evaluating performance based on an expert-labeled dataset, and ii) demonstrating machine learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Mobile Crowdsensing and Crowdsourcing · Topic Modeling
