Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
Michael Bloodgood, Chris Callison-Burch

TL;DR
This paper introduces a cost-focused active learning algorithm to enhance statistical machine translation by efficiently adding data, achieving significantly higher performance improvements even with substantial existing resources.
Contribution
The paper presents a novel active learning approach tailored for large-scale machine translation that overcomes diminishing returns and demonstrates substantial performance gains.
Findings
Order of magnitude increase in improvement rate
Effective data solicitation via Amazon Mechanical Turk
Overcomes diminishing returns in resource-rich scenarios
Abstract
We explore how to improve machine translation systems by adding more translation data in situations where we already have substantial resources. The main challenge is how to buck the trend of diminishing returns that is commonly encountered. We present an active learning-style data solicitation algorithm to meet this challenge. We test it, gathering annotations via Amazon Mechanical Turk, and find that we get an order of magnitude increase in performance rates of improvement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Topic Modeling · Natural Language Processing Techniques
