Accelerating NMT Batched Beam Decoding with LMBR Posteriors for   Deployment

Gonzalo Iglesias; William Tambellini; Adri\`a De Gispert; Eva Hasler; and Bill Byrne

arXiv:1804.11324·cs.CL·May 1, 2018

Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment

Gonzalo Iglesias, William Tambellini, Adri\`a De Gispert, Eva Hasler, and Bill Byrne

PDF

TL;DR

This paper introduces a batched beam decoding method for neural machine translation that incorporates LMBR posteriors, improving performance and deployment efficiency over existing Transformer-based models.

Contribution

It presents a novel batched decoding algorithm with LMBR posteriors and discusses acceleration strategies for practical deployment, enhancing NMT decoding speed and memory efficiency.

Findings

01

LMBR posteriors improve translation quality beyond Transformer baselines

02

Batched decoding accelerates inference without sacrificing accuracy

03

Deployment strategies reduce memory usage and increase speed

Abstract

We describe a batched beam decoding algorithm for NMT with LMBR n-gram posteriors, showing that LMBR techniques still yield gains on top of the best recently reported results with Transformers. We also discuss acceleration strategies for deployment, and the effect of the beam size and batching on memory and speed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.