MTUncertainty: Assessing the Need for Post-editing of Machine Translation Outputs by Fine-tuning OpenAI LLMs
Serge Gladkoff, Lifeng Han, Gleb Erofeev, Irina Sorokina, Goran, Nenadic

TL;DR
This paper investigates the use of fine-tuned OpenAI large language models, particularly GPT-3.5, for automatic translation quality evaluation to determine if post-editing is necessary, across multiple language pairs.
Contribution
It demonstrates that fine-tuned GPT-3.5 performs well in translation quality prediction, and shows that larger model sizes do not necessarily improve performance.
Findings
Fine-tuned GPT-3.5 achieves good accuracy in TQE.
Increasing LLM size does not significantly enhance TQE performance.
Performance varies across different language pairs.
Abstract
Translation Quality Evaluation (TQE) is an essential step of the modern translation production process. TQE is critical in assessing both machine translation (MT) and human translation (HT) quality without reference translations. The ability to evaluate or even simply estimate the quality of translation automatically may open significant efficiency gains through process optimisation. This work examines whether the state-of-the-art large language models (LLMs) can be used for this purpose. We take OpenAI models as the best state-of-the-art technology and approach TQE as a binary classification task. On eight language pairs including English to Italian, German, French, Japanese, Dutch, Portuguese, Turkish, and Chinese, our experimental results show that fine-tuned gpt3.5 can demonstrate good performance on translation quality prediction tasks, i.e. whether the translation needs to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
