Recursive Reinforcement Learning
Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh, Trivedi, Dominik Wojtczak

TL;DR
This paper introduces Recursive Q-learning, a new model-free reinforcement learning algorithm designed to handle environments modeled as recursively invoked Markov decision processes, enabling better reasoning about recursive structures.
Contribution
It develops the first RL algorithm capable of directly learning optimal policies in environments with recursive MDPs, bridging a gap in handling recursive structures.
Findings
Recursive Q-learning converges for finite RMDPs.
The approach models probabilistic programs with recursion.
It offers a transparent alternative to manual feature engineering.
Abstract
Recursion is the fundamental paradigm to finitely describe potentially infinite objects. As state-of-the-art reinforcement learning (RL) algorithms cannot directly reason about recursion, they must rely on the practitioner's ingenuity in designing a suitable "flat" representation of the environment. The resulting manual feature constructions and approximations are cumbersome and error-prone; their lack of transparency hampers scalability. To overcome these challenges, we develop RL algorithms capable of computing optimal policies in environments described as a collection of Markov decision processes (MDPs) that can recursively invoke one another. Each constituent MDP is characterized by several entry and exit points that correspond to input and output values of these invocations. These recursive MDPs (or RMDPs) are expressively equivalent to probabilistic pushdown systems (with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Evolutionary Algorithms and Applications
MethodsQ-Learning
