Chain-of-Thought Reasoning Improves Context-Aware Translation with Large Language Models
Shabnam Ataee, Hugo Huart, Andrei Popescu-Belis

TL;DR
This study demonstrates that chain-of-thought prompting significantly enhances large language models' ability to perform context-aware translation tasks involving inter-sentential dependencies, especially for high-performing models.
Contribution
It introduces the use of chain-of-thought reasoning prompts to improve translation accuracy in large language models for complex inter-sentential dependencies.
Findings
Models with chain-of-thought reasoning achieve ~90% accuracy on distinguishing correct translations.
GPT-4 and Phi models reach about 92% COMET scores in translation quality.
Reasoning improvements are more pronounced in already high-performing models.
Abstract
This paper assesses the ability of large language models (LLMs) to translate texts that include inter-sentential dependencies. We use the English-French DiscEvalMT benchmark (Bawden et al., 2018) with pairs of sentences containing translation challenges for pronominal anaphora and lexical cohesion. We evaluate 12 LLMs from the DeepSeek-R1, GPT, Llama, Mistral and Phi families on two tasks: (1) distinguish a correct translation from a wrong but plausible one; and (2) generate a correct translation. We compare prompts that encourage chain-of-thought reasoning with those that do not. The best models take advantage of reasoning and reach about 90% accuracy on the first task and COMET scores of about 92% on the second task, with GPT-4, GPT-4o and Phi standing out. Moreover, we observe a "wise get wiser" effect: the improvements through reasoning are larger for models that already perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
