Automatic Input Rewriting Improves Translation with Large Language Models

Dayeon Ki; Marine Carpuat

arXiv:2502.16682·cs.CL·September 3, 2025

Automatic Input Rewriting Improves Translation with Large Language Models

Dayeon Ki, Marine Carpuat

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper investigates how automatic input rewriting using large language models can enhance machine translation quality, demonstrating that text simplification notably improves translation accuracy across multiple languages.

Contribution

It provides an empirical analysis of 21 input rewriting methods with open-weight LLMs, highlighting the effectiveness of text simplification and quality estimation in improving translation.

Findings

01

Text simplification is the most effective rewriting strategy.

02

Quality estimation further enhances translation quality.

03

Human evaluation confirms preservation of meaning in simplified rewrites.

Abstract

Can we improve machine translation (MT) with LLMs by rewriting their inputs automatically? Users commonly rely on the intuition that well-written text is easier to translate when using off-the-shelf MT systems. LLMs can rewrite text in many ways but in the context of MT, these capabilities have been primarily exploited to rewrite outputs via post-editing. We present an empirical study of 21 input rewriting methods with 3 open-weight LLMs for translating from English into 6 target languages. We show that text simplification is the most effective MT-agnostic rewrite strategy and that it can be improved further when using quality estimation to assess translatability. Human evaluation further confirms that simplified rewrites and their MT outputs both largely preserve the original meaning of the source and MT. These results suggest LLM-assisted input rewriting as a promising direction for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Automatic Input Rewriting Improves Translation with Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies