Multi-representation Ensembles and Delayed SGD Updates Improve   Syntax-based NMT

Danielle Saunders; Felix Stahlberg; Adria de Gispert; Bill Byrne

arXiv:1805.00456·cs.CL·May 14, 2018

Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT

Danielle Saunders, Felix Stahlberg, Adria de Gispert, Bill Byrne

PDF

TL;DR

This paper introduces novel ensemble and training strategies incorporating target syntax into Neural Machine Translation, achieving state-of-the-art results on Japanese-English translation.

Contribution

It proposes a method combining multi-representation ensembles with delayed SGD updates for syntax-based NMT, improving translation quality.

Findings

01

State-of-the-art performance on Japanese-English translation

02

Effective use of WFSTs for beam search over ensembles

03

Delayed SGD updates enhance training with long representations

Abstract

We explore strategies for incorporating target syntax into Neural Machine Translation. We specifically focus on syntax in ensembles containing multiple sentence representations. We formulate beam search over such ensembles using WFSTs, and describe a delayed SGD update training procedure that is especially effective for long representations like linearized syntax. Our approach gives state-of-the-art performance on a difficult Japanese-English task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent