Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment
Gonzalo Iglesias, William Tambellini, Adri\`a De Gispert, Eva Hasler, and Bill Byrne

TL;DR
This paper introduces a batched beam decoding method for neural machine translation that incorporates LMBR posteriors, improving performance and deployment efficiency over existing Transformer-based models.
Contribution
It presents a novel batched decoding algorithm with LMBR posteriors and discusses acceleration strategies for practical deployment, enhancing NMT decoding speed and memory efficiency.
Findings
LMBR posteriors improve translation quality beyond Transformer baselines
Batched decoding accelerates inference without sacrificing accuracy
Deployment strategies reduce memory usage and increase speed
Abstract
We describe a batched beam decoding algorithm for NMT with LMBR n-gram posteriors, showing that LMBR techniques still yield gains on top of the best recently reported results with Transformers. We also discuss acceleration strategies for deployment, and the effect of the beam size and batching on memory and speed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
