Towards Chapter-to-Chapter Context-Aware Literary Translation via Large Language Models
Linghao Jin, Li An, Xuezhe Ma

TL;DR
This paper introduces a new chapter-to-chapter context-aware translation setting for literary texts, creates a novel Chinese-English literature dataset, and demonstrates that finetuning large language models significantly improves translation quality in this challenging domain.
Contribution
The paper presents a novel Ch2Ch translation setting, a new literary dataset, and shows that finetuning LLMs enhances translation performance in complex discourse scenarios.
Findings
Ch2Ch translation is more challenging than sentence-level translation.
Finetuning LLMs yields significant quality improvements.
Literary translation requires specialized models and decoding strategies.
Abstract
Discourse phenomena in existing document-level translation datasets are sparse, which has been a fundamental obstacle in the development of context-aware machine translation models. Moreover, most existing document-level corpora and context-aware machine translation methods rely on an unrealistic assumption on sentence-level alignments. To mitigate these issues, we first curate a novel dataset of Chinese-English literature, which consists of 160 books with intricate discourse structures. Then, we propose a more pragmatic and challenging setting for context-aware translation, termed chapter-to-chapter (Ch2Ch) translation, and investigate the performance of commonly-used machine translation models under this setting. Furthermore, we introduce a potential approach of finetuning large language models (LLMs) within the domain of Ch2Ch literary translation, yielding impressive improvements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
