Neural Turing Machines: Convergence of Copy Tasks

Janez Ale\v{s}

arXiv:1612.02336·cs.NE·December 8, 2016

Neural Turing Machines: Convergence of Copy Tasks

Janez Ale\v{s}

PDF

Open Access

TL;DR

This paper investigates the training challenges of neural Turing machines and demonstrates improved prediction accuracy on copy and repeat copy tasks with larger sequences, highlighting their potential scalability.

Contribution

The study enhances the prediction quality of neural Turing machines on larger copy tasks, addressing training difficulty and scalability issues.

Findings

01

High accuracy on copy tasks with sequences six times larger than training sequences.

02

Effective predictions on repeat copy tasks with doubled sequence length and repetition.

03

Demonstrates potential for neural Turing machines to handle larger sequences.

Abstract

The architecture of neural Turing machines is differentiable end to end and is trainable with gradient descent methods. Due to their large unfolded depth Neural Turing Machines are hard to train and because of their linear access of complete memory they do not scale. Other architectures have been studied to overcome these difficulties. In this report we focus on improving the quality of prediction of the original linear memory architecture on copy and repeat copy tasks. Copy task predictions on sequences of length six times larger than those the neural Turing machine was trained on prove to be highly accurate and so do predictions of repeat copy tasks for sequences with twice the repetition number and twice the sequence length neural Turing machine was trained on.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques

MethodsSoftmax · Sigmoid Activation · Tanh Activation · Neural Turing Machine · Location-based Attention · Content-based Attention · Long Short-Term Memory