MT-LENS: An all-in-one Toolkit for Better Machine Translation Evaluation
Javier Garc\'ia Gilabert, Carlos Escolano, Audrey Mash, Xixian Liao,, Maite Melero

TL;DR
MT-LENS is a comprehensive toolkit that enhances the evaluation of machine translation systems by covering quality, bias, toxicity, and robustness, with interactive visualization and support for diverse datasets.
Contribution
It extends existing evaluation frameworks to include multiple aspects of MT performance, offering a unified, user-friendly platform for thorough assessment.
Findings
Supports diverse evaluation metrics and datasets
Enables analysis of biases and robustness
Provides interactive visualization tools
Abstract
We introduce MT-LENS, a framework designed to evaluate Machine Translation (MT) systems across a variety of tasks, including translation quality, gender bias detection, added toxicity, and robustness to misspellings. While several toolkits have become very popular for benchmarking the capabilities of Large Language Models (LLMs), existing evaluation tools often lack the ability to thoroughly assess the diverse aspects of MT performance. MT-LENS addresses these limitations by extending the capabilities of LM-eval-harness for MT, supporting state-of-the-art datasets and a wide range of evaluation metrics. It also offers a user-friendly platform to compare systems and analyze translations with interactive visualizations. MT-LENS aims to broaden access to evaluation strategies that go beyond traditional translation quality evaluation, enabling researchers and engineers to better understand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
