Real-Time Recurrent Reinforcement Learning
Julian Lemmel, Radu Grosu

TL;DR
This paper presents RTRRL, a biologically inspired reinforcement learning framework that employs recurrent networks and online gradient computation to solve partially observable tasks, modeling biological neural learning processes.
Contribution
It introduces a novel biologically plausible RL algorithm combining Meta-RL, temporal difference learning, and online differentiation for POMDPs.
Findings
Successfully solves diverse POMDP tasks
Models reward pathways in basal ganglia
Demonstrates biological plausibility in RL
Abstract
We introduce a biologically plausible RL framework for solving tasks in partially observable Markov decision processes (POMDPs). The proposed algorithm combines three integral parts: (1) A Meta-RL architecture, resembling the mammalian basal ganglia; (2) A biologically plausible reinforcement learning algorithm, exploiting temporal difference learning and eligibility traces to train the policy and the value-function; (3) An online automatic differentiation algorithm for computing the gradients with respect to parameters of a shared recurrent network backbone. Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural dynamics and brain function · Neural and Behavioral Psychology Studies · Reinforcement Learning in Robotics
