Discourse Graph Guided Document Translation with Large Language Models
Viet-Thanh Pham, Minghan Wang, Hao-Han Liao, Thuy-Trang Vu

TL;DR
TransGraph is a novel discourse-guided framework for document translation that models inter-chunk relationships with structured graphs, improving translation quality and efficiency over existing methods.
Contribution
It introduces a discourse graph-based approach to explicitly model long-range dependencies in document translation, reducing computational overhead.
Findings
Outperforms strong baselines in translation quality.
Enhances terminology consistency across documents.
Reduces token overhead compared to existing methods.
Abstract
Adapting large language models to full document translation remains challenging due to the difficulty of capturing long-range dependencies and preserving discourse coherence throughout extended texts. While recent agentic machine translation systems mitigate context window constraints through multi-agent orchestration and persistent memory, they require substantial computational resources and are sensitive to memory retrieval strategies. We introduce TransGraph, a discourse-guided framework that explicitly models inter-chunk relationships through structured discourse graphs and selectively conditions each translation segment on relevant graph neighbourhoods rather than relying on sequential or exhaustive context. Across three document-level MT benchmarks spanning six languages and diverse domains, TransGraph consistently surpasses strong baselines in translation quality and terminology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
