Beyond Reproduction: A Paired-Task Framework for Assessing LLM Comprehension and Creativity in Literary Translation

Ran Zhang; Steffen Eger; Arda Tezcan; Wei Zhao; Simone Paolo Ponzetto; Lieve Macken

arXiv:2604.18169·cs.CL·April 21, 2026

Beyond Reproduction: A Paired-Task Framework for Assessing LLM Comprehension and Creativity in Literary Translation

Ran Zhang, Steffen Eger, Arda Tezcan, Wei Zhao, Simone Paolo Ponzetto, Lieve Macken

PDF

TL;DR

This paper introduces a paired-task framework to evaluate large language models' abilities in both understanding source texts and generating creative translations, revealing significant gaps in AI creativity compared to humans.

Contribution

It presents a novel paired-task evaluation method combining human and automatic scoring to benchmark LLMs on comprehension and creativity in literary translation.

Findings

01

Strong comprehension does not lead to human-level creativity.

02

Models often produce literal or inappropriate translations, especially in English-Chinese pairs.

03

Only one model, Mistral-Large, approaches human creativity scores.

Abstract

Large language models (LLMs) are increasingly used for creative tasks such as literary translation. Yet translational creativity remains underexplored and is rarely evaluated at scale, while source-text comprehension is typically studied in isolation, despite the fact that, in professional translation, comprehension and creativity are tightly intertwined. We address these gaps with a paired-task framework applied to literary excerpts from 11 books. Task 1 assesses source-text comprehension, and Task 2 evaluates translational creativity through Units of Creative Potential (UCPs), such as metaphors and wordplay. Using a scalable evaluation setup that combines expert human annotations with UCP-based automatic scoring, we benchmark 23 models and four creativity-oriented prompts. Our findings show that strong comprehension does not translate into human-level creativity: models often produce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.