A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls

Sheikh Shafayat; Dongkeun Yoon; Woori Jang; Jiwoo Choi; Alice Oh; Seohyon Jung

arXiv:2412.01340·cs.CL·September 15, 2025

A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls

Sheikh Shafayat, Dongkeun Yoon, Woori Jang, Jiwoo Choi, Alice Oh, Seohyon Jung

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a two-stage evaluation framework for literary machine translation from English to Korean, offering more interpretable metrics that better align with human judgments but still face challenges in cultural nuances.

Contribution

The study presents a novel two-step pipeline for literary translation evaluation, improving correlation with human assessments and highlighting limitations of current metrics.

Findings

01

Framework provides fine-grained, interpretable metrics.

02

Higher correlation with human judgment than traditional metrics.

03

Challenges remain in evaluating cultural and honorific aspects.

Abstract

In this work, we propose and evaluate the feasibility of a two-stage pipeline to evaluate literary machine translation, in a fine-grained manner, from English to Korean. The results show that our framework provides fine-grained, interpretable metrics suited for literary translation and obtains a higher correlation with human judgment than traditional machine translation metrics. Nonetheless, it still fails to match inter-human agreement, especially in metrics like Korean Honorifics. We also observe that LLMs tend to favor translations generated by other LLMs, and we highlight the necessity of developing more sophisticated evaluation methods to ensure accurate and culturally sensitive machine translation of literary works.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

skshafayat/two-step-lit-eval
dataset· 23 dl
23 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques