TL;DR
BabelDOC introduces an IR-based framework for translating PDFs that maintains layout fidelity, enhances visual aesthetics, and ensures terminology consistency, addressing the challenge of cross-lingual document translation.
Contribution
It presents a novel layout-preserving PDF translation method using an intermediate representation, improving over existing approaches in fidelity and aesthetics.
Findings
BabelDOC outperforms baselines in layout fidelity and terminology consistency.
Human and multimodal LLM evaluations favor BabelDOC's translation quality.
The open-source toolkit has gained significant community engagement with over 8.4K GitHub stars.
Abstract
As global cross-lingual communication intensifies, language barriers in visually rich documents such as PDFs remain a practical bottleneck. Existing document translation pipelines face a tension between linguistic processing and layout preservation: text-oriented Computer-Assisted Translation (CAT) systems often discard structural metadata, while document parsers focus on extraction and do not support faithful re-rendering after translation. We introduce BabelDOC, an Intermediate Representation (IR)-based framework for layout-preserving PDF translation. BabelDOC decouples visual layout metadata from semantic content, enabling document-level translation operations such as terminology extraction, cross-page context handling, glossary-constrained generation, and formula placeholdering. The translated content is then re-anchored to the original layout through an adaptive typesetting engine.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
