Diverse Pretrained Context Encodings Improve Document Translation
Domenic Donato, Lei Yu, Chris Dyer

TL;DR
This paper introduces a novel document translation model that leverages multiple pretrained context signals, significantly enhancing translation quality and efficiency across various datasets by effectively utilizing source and target contexts.
Contribution
The paper presents a new architecture incorporating multiple pretrained document context signals, demonstrating improved translation performance and sample efficiency over existing models.
Findings
Pretrained context representations improve sample efficiency.
Adequate parallel data is essential for effective context utilization.
Joint conditioning on multiple contexts outperforms single-context models.
Abstract
We propose a new architecture for adapting a sentence-level sequence-to-sequence transformer by incorporating multiple pretrained document context signals and assess the impact on translation performance of (1) different pretraining approaches for generating these signals, (2) the quantity of parallel data for which document context is available, and (3) conditioning on source, target, or source and target contexts. Experiments on the NIST Chinese-English, and IWSLT and WMT English-German tasks support four general conclusions: that using pretrained context representations markedly improves sample efficiency, that adequate parallel data resources are crucial for learning to use document context, that jointly conditioning on multiple context representations outperforms any single representation, and that source context is more valuable for translation performance than target side…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
