What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation

Shaomu Tan; Dawei Zhu; Ke Tran; Michael Denkowski; Sony Trenous; Bill Byrne; Leonardo Ribeiro; Felix Hieber

arXiv:2605.13368·cs.CL·May 14, 2026

What Does LLM Refinement Actually Improve? A Systematic Study on Document-Level Literary Translation

Shaomu Tan, Dawei Zhu, Ke Tran, Michael Denkowski, Sony Trenous, Bill Byrne, Leonardo Ribeiro, Felix Hieber

PDF

TL;DR

This systematic study investigates how iterative self-refinement in large language models affects document-level literary translation, revealing that simple, general refinement strategies improve fluency, style, and terminology more reliably than targeted error correction.

Contribution

The paper provides a comprehensive analysis of document-level LLM refinement strategies, identifying effective pipelines, and clarifying their impact on translation quality and limitations.

Findings

01

Document-level MT followed by segment-level refinement is most effective.

02

General refinement prompts outperform error-specific prompts.

03

Refinement improves fluency, style, and terminology, but less so adequacy.

Abstract

Iterative self-refinement is a simple inference-time strategy for machine translation: an LLM revises its own translation over multiple inference-time passes. Yet document-scale refinement remains poorly understood: 1) which pipelines work best, 2) what quality dimensions improve, and 3) how refiners behave. In this paper, we present a systematic study of document-level literary translation, covering nine LLMs and seven language pairs. Across nine translation-refinement granularity combinations and five refinement strategies, we find a robust recipe: document-level MT followed by segment-level refinement yields strong and stable improvements. In contrast, document-level refinement often makes fewer edits and leads to smaller or less reliable gains. Beyond granularity, A simple general refinement prompt consistently outperforms error-specific prompting and evaluate-then-refine schemes.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.