Neural GPUs Learn Algorithms
{\L}ukasz Kaiser, Ilya Sutskever

TL;DR
The paper introduces the Neural GPU, a parallel neural network architecture capable of learning and generalizing algorithms like addition and multiplication on arbitrarily long inputs, outperforming sequential models like NTMs.
Contribution
It presents the Neural GPU architecture, which is highly parallel, easier to train, and capable of learning algorithms with input size generalization, unlike prior sequential models.
Findings
Neural GPU successfully generalizes to long binary addition and multiplication.
Training with parameter sharing relaxation improves deep recurrent network learning.
Dropout and gradient noise enhance learning and generalization.
Abstract
Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that use backpropagation to learn their own programming. Despite their appeal NTMs have a weakness that is caused by their sequential nature: they are not parallel and are are hard to train due to their large depth when unfolded. We present a neural network architecture to address this problem: the Neural GPU. It is based on a type of convolutional gated recurrent unit and, like the NTM, is computationally universal. Unlike the NTM, the Neural GPU is highly parallel which makes it easier to train and efficient to run. An essential property of algorithms is their ability to handle inputs of arbitrary size. We show that the Neural GPU can be trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Neural Networks and Applications · Ferroelectric and Negative Capacitance Devices
