Improving the Neural GPU Architecture for Algorithm Learning
Karlis Freivalds, Renars Liepins

TL;DR
This paper enhances the Neural GPU architecture to learn algorithms more efficiently, introducing new techniques that reduce training time and enable end-to-end learning of decimal multiplication.
Contribution
The paper proposes novel improvements to Neural GPU, including hard nonlinearities with saturation costs and diagonal gates, enabling better generalization and decimal multiplication learning.
Findings
Reduced training time for Neural GPU models
Achieved end-to-end decimal multiplication learning
Introduced general techniques applicable to active-memory models
Abstract
Algorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that substantially reduces training time and improves generalization. We introduce a new technique - hard nonlinearities with saturation costs- that has general applicability. We also introduce a technique of diagonal gates that can be applied to active-memory models. The proposed architecture is the first capable of learning decimal multiplication end-to-end.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Ferroelectric and Negative Capacitance Devices · Stochastic Gradient Optimization Techniques
