Do LLMs Truly Benefit from Longer Context in Automatic Post-Editing?

Ahrii Kim; Seong-heum Kim

arXiv:2601.19410·cs.CL·March 13, 2026

Do LLMs Truly Benefit from Longer Context in Automatic Post-Editing?

Ahrii Kim, Seong-heum Kim

PDF

Open Access 1 Video

TL;DR

This paper systematically evaluates large language models for automatic post-editing, revealing that while proprietary models perform well with simple prompts, they do not effectively utilize document context and are costly, highlighting the need for more efficient solutions.

Contribution

It provides a comprehensive comparison of proprietary and open-weight LLMs for document-level APE, analyzing their performance, robustness, and limitations in context utilization.

Findings

01

Proprietary LLMs achieve near human-level APE quality with simple prompts.

02

Open-weight models are less robust but better at exploiting document context.

03

Current automatic metrics do not reliably reflect qualitative improvements.

Abstract

Automatic post-editing (APE) aims to refine machine translations by correcting residual errors. Although recent large language models (LLMs) demonstrate strong translation capabilities, their effectiveness for APE--especially under document-level context--remains insufficiently understood. We present a systematic comparison of proprietary and open-weight LLMs under a naive document-level prompting setup, analyzing APE quality, contextual behavior, robustness, and efficiency. Our results show that proprietary LLMs achieve near human-level APE quality even with simple one-shot prompting, regardless of whether document context is provided. While these models exhibit higher robustness to data poisoning attacks than open-weight counterparts, this robustness also reveals a limitation: they largely fail to exploit document-level context for contextual error correction. Furthermore, standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Do LLMs Truly Benefit from Longer Context in Automatic Post-Editing?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Adversarial Robustness in Machine Learning