Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT
Danielle Saunders, Felix Stahlberg, Adria de Gispert, Bill Byrne

TL;DR
This paper introduces novel ensemble and training strategies incorporating target syntax into Neural Machine Translation, achieving state-of-the-art results on Japanese-English translation.
Contribution
It proposes a method combining multi-representation ensembles with delayed SGD updates for syntax-based NMT, improving translation quality.
Findings
State-of-the-art performance on Japanese-English translation
Effective use of WFSTs for beam search over ensembles
Delayed SGD updates enhance training with long representations
Abstract
We explore strategies for incorporating target syntax into Neural Machine Translation. We specifically focus on syntax in ensembles containing multiple sentence representations. We formulate beam search over such ensembles using WFSTs, and describe a delayed SGD update training procedure that is especially effective for long representations like linearized syntax. Our approach gives state-of-the-art performance on a difficult Japanese-English task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
