Truth Discovery in Sequence Labels from Crowds
Nasim Sabetpour, Adithya Kulkarni, Sihong Xie, Qi Li

TL;DR
This paper introduces AggSLC, an optimization-based method for accurately inferring true sequence labels from crowdsourced annotations, effectively handling dependencies and worker reliability in NLP tasks.
Contribution
It proposes a novel aggregation algorithm for sequential labels that considers task characteristics, worker reliability, and machine learning, with proven convergence and superior performance.
Findings
Outperforms existing aggregation methods on NER and biomedical datasets.
Demonstrates effective handling of sequential dependencies and worker reliability.
Shows convergence of the proposed algorithm after finite iterations.
Abstract
Annotation quality and quantity positively affect the learning performance of sequence labeling, a vital task in Natural Language Processing. Hiring domain experts to annotate a corpus is very costly in terms of money and time. Crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), have been deployed to assist in this purpose. However, the annotations collected this way are prone to human errors due to the lack of expertise of the crowd workers. Existing literature in annotation aggregation assumes that annotations are independent and thus faces challenges when handling the sequential label aggregation tasks with complex dependencies. To conquer the challenges, we propose an optimization-based method that infers the ground truth labels using annotations provided by workers for sequential labeling tasks. The proposed Aggregation method for Sequential Labels from Crowds ()…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Machine Learning and Data Classification · Data Stream Mining Techniques
