Evaluating Discourse Phenomena in Neural Machine Translation
Rachel Bawden, Rico Sennrich, Alexandra Birch, Barry Haddow

TL;DR
This paper introduces discourse-specific test sets and evaluates neural machine translation models' ability to handle discourse phenomena, revealing limited improvements from existing models and proposing new strategies for better context utilization.
Contribution
The paper presents handcrafted discourse test sets and compares different context modeling strategies, including a novel multi-encoding approach, for NMT.
Findings
Multi-encoder models show limited improvement in discourse handling.
Concatenating previous and current sentences improves performance.
Multi-encoding and decoding of two sentences yields the best results.
Abstract
For machine translation to tackle discourse phenomena, models must have access to extra-sentential linguistic context. There has been recent interest in modelling context in neural machine translation (NMT), but models have been principally evaluated with standard automatic metrics, poorly adapted to evaluating discourse phenomena. In this article, we present hand-crafted, discourse test sets, designed to test the models' ability to exploit previous source and target sentences. We investigate the performance of recently proposed multi-encoder NMT models trained on subtitles for English to French. We also explore a novel way of exploiting context from the previous sentence. Despite gains using BLEU, multi-encoder models give limited improvement in the handling of discourse phenomena: 50% accuracy on our coreference test set and 53.5% for coherence/cohesion (compared to a non-contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
