COMET-QE and Active Learning for Low-Resource Machine Translation

Everlyn Asiko Chimoto; Bruce A. Bassett

arXiv:2210.15696·cs.CL·October 31, 2022

COMET-QE and Active Learning for Low-Resource Machine Translation

Everlyn Asiko Chimoto, Bruce A. Bassett

PDF

Open Access

TL;DR

This paper demonstrates that using COMET-QE as a reference-free evaluation metric in active learning significantly improves sentence selection for low-resource neural machine translation, outperforming RTTL and random methods.

Contribution

It introduces COMET-QE as an effective tool for active learning in low-resource machine translation, showing superior performance over existing methods.

Findings

01

COMET-QE outperforms RTTL and random selection by up to 5 BLEU points.

02

Active learning with COMET-QE reduces data requirements for effective translation.

03

Results are demonstrated on Swahili, Kinyarwanda, and Spanish datasets.

Abstract

Active learning aims to deliver maximum benefit when resources are scarce. We use COMET-QE, a reference-free evaluation metric, to select sentences for low-resource neural machine translation. Using Swahili, Kinyarwanda and Spanish for our experiments, we show that COMET-QE significantly outperforms two variants of Round Trip Translation Likelihood (RTTL) and random sentence selection by up to 5 BLEU points for 20k sentences selected by Active Learning on a 30k baseline. This suggests that COMET-QE is a powerful tool for sentence selection in the very low-resource limit.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification