Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks
Khurram Javed, Haseeb Shah, Rich Sutton, Martha White

TL;DR
This paper introduces scalable methods for real-time recurrent learning that maintain gradient accuracy without noise, enabling online updates for large networks in reinforcement learning tasks.
Contribution
It proposes two novel constraints that make RTRL scalable, allowing linear scaling with parameters without adding noise or bias, unlike previous methods.
Findings
Outperforms Truncated-BPTT on a prediction benchmark.
Effective in policy evaluation for Atari 2600 games.
Scales linearly with network size, enabling large-scale online learning.
Abstract
Constructing states from sequences of observations is an important component of reinforcement learning agents. One solution for state construction is to use recurrent neural networks. Back-propagation through time (BPTT), and real-time recurrent learning (RTRL) are two popular gradient-based methods for recurrent learning. BPTT requires complete trajectories of observations before it can compute the gradients and is unsuitable for online updates. RTRL can do online updates but scales poorly to large networks. In this paper, we propose two constraints that make RTRL scalable. We show that by either decomposing the network into independent modules or learning the network in stages, we can make RTRL scale linearly with the number of parameters. Unlike prior scalable gradient estimation algorithms, such as UORO and Truncated-BPTT, our algorithms do not add noise or bias to the gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Reinforcement Learning in Robotics · Model Reduction and Neural Networks
MethodsUnbiased Online Recurrent Optimization
