Learning a Cost-Effective Annotation Policy for Question Answering
Bernhard Kratzwald, Stefan Feuerriegel, Huan Sun

TL;DR
This paper introduces a novel semi-supervised framework for question answering dataset annotation that learns to minimize human effort and annotation costs while improving over time.
Contribution
It proposes the first cost-effective annotation policy for QA, combining semi-supervised learning with a feedback loop to reduce annotation costs.
Findings
Reduced annotation cost by up to 21.1%.
Leveraged system suggestions to minimize human effort.
Continuous improvement through feedback enhances efficiency.
Abstract
State-of-the-art question answering (QA) relies upon large amounts of training data for which labeling is time consuming and thus expensive. For this reason, customizing QA systems is challenging. As a remedy, we propose a novel framework for annotating QA datasets that entails learning a cost-effective annotation policy and a semi-supervised annotation scheme. The latter reduces the human effort: it leverages the underlying QA system to suggest potential candidate annotations. Human annotators then simply provide binary feedback on these candidates. Our system is designed such that past annotations continuously improve the future performance and thus overall annotation cost. To the best of our knowledge, this is the first paper to address the problem of annotating questions with minimal annotation cost. We compare our framework against traditional manual annotations in an extensive set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
