Do LLMs Truly Benefit from Longer Context in Automatic Post-Editing?
Ahrii Kim, Seong-heum Kim

TL;DR
This paper systematically evaluates large language models for automatic post-editing, revealing that while proprietary models perform well with simple prompts, they do not effectively utilize document context and are costly, highlighting the need for more efficient solutions.
Contribution
It provides a comprehensive comparison of proprietary and open-weight LLMs for document-level APE, analyzing their performance, robustness, and limitations in context utilization.
Findings
Proprietary LLMs achieve near human-level APE quality with simple prompts.
Open-weight models are less robust but better at exploiting document context.
Current automatic metrics do not reliably reflect qualitative improvements.
Abstract
Automatic post-editing (APE) aims to refine machine translations by correcting residual errors. Although recent large language models (LLMs) demonstrate strong translation capabilities, their effectiveness for APE--especially under document-level context--remains insufficiently understood. We present a systematic comparison of proprietary and open-weight LLMs under a naive document-level prompting setup, analyzing APE quality, contextual behavior, robustness, and efficiency. Our results show that proprietary LLMs achieve near human-level APE quality even with simple one-shot prompting, regardless of whether document context is provided. While these models exhibit higher robustness to data poisoning attacks than open-weight counterparts, this robustness also reveals a limitation: they largely fail to exploit document-level context for contextual error correction. Furthermore, standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Adversarial Robustness in Machine Learning
