What do Large Language Models Need for Machine Translation Evaluation?
Shenbin Qian, Archchana Sindhujan, Minnie Kabra, Diptesh Kanojia,, Constantin Or\u{a}san, Tharindu Ranasinghe, Fr\'ed\'eric Blain

TL;DR
This paper investigates the requirements and effectiveness of large language models in evaluating machine translation quality, emphasizing the importance of reference translations and prompting techniques across various languages and model sizes.
Contribution
It provides a comprehensive analysis of LLM-based MT evaluation, highlighting the role of reference data and prompting methods, and offers publicly available resources for reproducibility.
Findings
Reference translations significantly improve evaluation accuracy.
Larger models benefit more from Chain of Thought prompting.
LLMs often do not produce numerical scores, raising reliability concerns.
Abstract
Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results comparable to fine-tuned multilingual pre-trained language models. In this paper, we explore what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate MT quality. In addition, we investigate prompting techniques such as zero-shot, Chain of Thought (CoT) and few-shot prompting for eight language pairs covering high-, medium- and low-resource languages, leveraging varying LLM variants. Our findings indicate the importance of reference translations for an LLM-based evaluation. While larger models do not necessarily fare better, they tend to benefit more from CoT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
