Neural Machine Translation in Linear Time
Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord,, Alex Graves, Koray Kavukcuoglu

TL;DR
The paper introduces ByteNet, a convolutional neural network for sequence processing that operates in linear time, achieving state-of-the-art results in language modeling and machine translation without relying on recurrent structures.
Contribution
It presents ByteNet, a novel convolutional architecture with dilation and dynamic decoding, enabling efficient, linear-time sequence processing and superior performance over recurrent models.
Findings
ByteNet achieves state-of-the-art character-level language modeling.
It outperforms previous neural translation models on English-German translation.
The model's representations reflect expected token alignments.
Abstract
We present a novel neural network for processing sequences. The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence. The two network parts are connected by stacking the decoder on top of the encoder and preserving the temporal resolution of the sequences. To address the differing lengths of the source and the target, we introduce an efficient mechanism by which the decoder is dynamically unfolded over the representation of the encoder. The ByteNet uses dilation in the convolutional layers to increase its receptive field. The resulting network has two core properties: it runs in time that is linear in the length of the sequences and it sidesteps the need for excessive memorization. The ByteNet decoder attains state-of-the-art performance on character-level language modelling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
