Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort
V\^ania Mendon\c{c}a (1, 2), Ricardo Rei (1, 2, 3), Luisa, Coheur (1, 2), Alberto Sardinha (1, 2), Ana L\'ucia Santos (4, 5), ((1) INESC-ID Lisboa, (2) Instituto Superior T\'ecnico, (3) Unbabel AI, (4), Centro de Lingu\'istica da Universidade de Lisboa

TL;DR
This paper introduces an online learning method that efficiently identifies the best machine translation systems with minimal human evaluation, significantly reducing effort while maintaining high accuracy.
Contribution
It presents a novel online learning framework that leverages limited human feedback to rapidly find top-performing translation systems among an ensemble.
Findings
Quickly converges to top-3 systems in experiments
Reduces human evaluation effort significantly
Effective even with sparse human feedback
Abstract
In Machine Translation, assessing the quality of a large amount of automatic translations can be challenging. Automatic metrics are not reliable when it comes to high performing systems. In addition, resorting to human evaluators can be expensive, especially when evaluating multiple systems. To overcome the latter challenge, we propose a novel application of online learning that, given an ensemble of Machine Translation systems, dynamically converges to the best systems, by taking advantage of the human feedback available. Our experiments on WMT'19 datasets show that our online approach quickly converges to the top-3 ranked systems for the language pairs considered, despite the lack of human feedback for many translations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
