Improving the Neural GPU Architecture for Algorithm Learning

Karlis Freivalds; Renars Liepins

arXiv:1702.08727·cs.NE·September 20, 2018·26 cites

Improving the Neural GPU Architecture for Algorithm Learning

Karlis Freivalds, Renars Liepins

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This paper enhances the Neural GPU architecture to learn algorithms more efficiently, introducing new techniques that reduce training time and enable end-to-end learning of decimal multiplication.

Contribution

The paper proposes novel improvements to Neural GPU, including hard nonlinearities with saturation costs and diagonal gates, enabling better generalization and decimal multiplication learning.

Findings

01

Reduced training time for Neural GPU models

02

Achieved end-to-end decimal multiplication learning

03

Introduced general techniques applicable to active-memory models

Abstract

Algorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that substantially reduces training time and improves generalization. We introduce a new technique - hard nonlinearities with saturation costs- that has general applicability. We also introduce a technique of diagonal gates that can be applied to active-memory models. The proposed architecture is the first capable of learning decimal multiplication end-to-end.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Ferroelectric and Negative Capacitance Devices · Stochastic Gradient Optimization Techniques