TL;DR
This paper explores how model size impacts multilingual machine translation evaluation metrics and demonstrates that distillation can significantly improve performance while reducing model complexity.
Contribution
It introduces a distillation approach for multilingual metrics that balances model capacity and multilinguality, improving performance with fewer parameters.
Findings
Model size limits cross-lingual transfer in evaluation metrics.
Distillation with synthetic data enhances performance.
Achieves 92.6% of RemBERT's performance with one-third of parameters.
Abstract
Recent developments in machine translation and multilingual text generation have led researchers to adopt trained metrics such as COMET or BLEURT, which treat evaluation as a regression problem and use representations from multilingual pre-trained models such as XLM-RoBERTa or mBERT. Yet studies on related tasks suggest that these models are most efficient when they are large, which is costly and impractical for evaluation. We investigate the trade-off between multilinguality and model capacity with RemBERT, a state-of-the-art multilingual language model, using data from the WMT Metrics Shared Task. We present a series of experiments which show that model size is indeed a bottleneck for cross-lingual transfer, then demonstrate how distillation can help addressing this bottleneck, by leveraging synthetic data generation and transferring knowledge from one teacher to multiple students…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsmBERT
