# Sharp Models on Dull Hardware: Fast and Accurate Neural Machine   Translation Decoding on the CPU

**Authors:** Jacob Devlin

arXiv: 1705.01991 · 2017-05-08

## TL;DR

This paper presents techniques to significantly speed up neural machine translation decoding on CPUs while maintaining high accuracy, through optimized beam search and a novel network architecture.

## Contribution

It introduces a set of decoding speedup techniques and a new neural network architecture that together enable fast, accurate NMT decoding on CPU hardware.

## Key findings

- 4.4x faster decoding without output change
- Achieves 38.3 BLEU on WMT English-French
- Decodes at 100 words/sec on CPU

## Abstract

Attentional sequence-to-sequence models have become the new standard for machine translation, but one challenge of such models is a significant increase in training and decoding cost compared to phrase-based systems. Here, we focus on efficient decoding, with a goal of achieving accuracy close the state-of-the-art in neural machine translation (NMT), while achieving CPU decoding speed/throughput close to that of a phrasal decoder.   We approach this problem from two angles: First, we describe several techniques for speeding up an NMT beam search decoder, which obtain a 4.4x speedup over a very efficient baseline decoder without changing the decoder output. Second, we propose a simple but powerful network architecture which uses an RNN (GRU/LSTM) layer at bottom, followed by a series of stacked fully-connected layers applied at every timestep. This architecture achieves similar accuracy to a deep recurrent model, at a small fraction of the training and decoding cost. By combining these techniques, our best system achieves a very competitive accuracy of 38.3 BLEU on WMT English-French NewsTest2014, while decoding at 100 words/sec on single-threaded CPU. We believe this is the best published accuracy/speed trade-off of an NMT system.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.01991/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1705.01991/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1705.01991/full.md

---
Source: https://tomesphere.com/paper/1705.01991