Efficient Inference for Multilingual Neural Machine Translation

Alexandre Berard; Dain Lee; St\'ephane Clinchant; Kweonwoo Jung,; Vassilina Nikoulina

arXiv:2109.06679·cs.CL·November 9, 2021

Efficient Inference for Multilingual Neural Machine Translation

Alexandre Berard, Dain Lee, St\'ephane Clinchant, Kweonwoo Jung,, Vassilina Nikoulina

PDF

TL;DR

This paper presents methods to significantly speed up multilingual neural machine translation inference by combining shallow decoders with vocabulary filtering, achieving over twice the speed without quality loss across multiple language pairs.

Contribution

It introduces a novel approach of using shallow decoders with vocabulary filtering to enhance inference speed in multilingual NMT without sacrificing translation quality.

Findings

01

Over twice faster inference with no quality loss

02

Effective in both small-scale and large-scale multilingual settings

03

Validated by BLEU, chrF, robustness, and human evaluations

Abstract

Multilingual NMT has become an attractive solution for MT deployment in production. But to match bilingual quality, it comes at the cost of larger and slower models. In this work, we consider several ways to make multilingual NMT faster at inference without degrading its quality. We experiment with several "light decoder" architectures in two 20-language multi-parallel settings: small-scale on TED Talks and large-scale on ParaCrawl. Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to more than twice faster inference with no loss in translation quality. We validate our findings with BLEU and chrF (on 380 language pairs), robustness evaluation and human evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.