Neural Turing Machines: Convergence of Copy Tasks
Janez Ale\v{s}

TL;DR
This paper investigates the training challenges of neural Turing machines and demonstrates improved prediction accuracy on copy and repeat copy tasks with larger sequences, highlighting their potential scalability.
Contribution
The study enhances the prediction quality of neural Turing machines on larger copy tasks, addressing training difficulty and scalability issues.
Findings
High accuracy on copy tasks with sequences six times larger than training sequences.
Effective predictions on repeat copy tasks with doubled sequence length and repetition.
Demonstrates potential for neural Turing machines to handle larger sequences.
Abstract
The architecture of neural Turing machines is differentiable end to end and is trainable with gradient descent methods. Due to their large unfolded depth Neural Turing Machines are hard to train and because of their linear access of complete memory they do not scale. Other architectures have been studied to overcome these difficulties. In this report we focus on improving the quality of prediction of the original linear memory architecture on copy and repeat copy tasks. Copy task predictions on sequences of length six times larger than those the neural Turing machine was trained on prove to be highly accurate and so do predictions of repeat copy tasks for sequences with twice the repetition number and twice the sequence length neural Turing machine was trained on.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques
MethodsSoftmax · Sigmoid Activation · Tanh Activation · Neural Turing Machine · Location-based Attention · Content-based Attention · Long Short-Term Memory
