# Unfolding and Shrinking Neural Machine Translation Ensembles

**Authors:** Felix Stahlberg, Bill Byrne

arXiv: 1704.03279 · 2017-07-24

## TL;DR

This paper presents methods to convert and shrink neural machine translation ensembles into a single, efficient model that maintains ensemble-level translation quality, significantly reducing runtime for production use.

## Contribution

It introduces a technique to unfold ensemble models into a single network and applies layer dimensionality reduction to match ensemble performance with single-model speed.

## Key findings

- Unfolded models can imitate ensemble outputs effectively.
- Shrunken models achieve ensemble-level translation quality.
- Decoding speed is comparable to a single NMT network.

## Abstract

Ensembling is a well-known technique in neural machine translation (NMT) to improve system performance. Instead of a single neural net, multiple neural nets with the same topology are trained separately, and the decoder generates predictions by averaging over the individual models. Ensembling often improves the quality of the generated translations drastically. However, it is not suitable for production systems because it is cumbersome and slow. This work aims to reduce the runtime to be on par with a single system without compromising the translation quality. First, we show that the ensemble can be unfolded into a single large neural network which imitates the output of the ensemble system. We show that unfolding can already improve the runtime in practice since more work can be done on the GPU. We proceed by describing a set of techniques to shrink the unfolded network by reducing the dimensionality of layers. On Japanese-English we report that the resulting network has the size and decoding speed of a single NMT network but performs on the level of a 3-ensemble system.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.03279/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1704.03279/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1704.03279/full.md

---
Source: https://tomesphere.com/paper/1704.03279