Programmatic Reinforcement Learning: Navigating Gridworlds
Guruprerana Shabadi, Nathana\"el Fijalkow, Th\'eo Matricon

TL;DR
This paper initiates a theoretical study of programmatic reinforcement learning in gridworlds, providing bounds on policy sizes and an algorithm for synthesis, bridging machine learning and formal methods.
Contribution
It introduces a formal framework for programmatic RL, establishes size bounds for optimal policies, and develops a synthesis algorithm with a prototype implementation.
Findings
Upper bounds on the size of optimal programmatic policies
An algorithm for synthesizing programmatic policies
Prototype implementation demonstrating the approach
Abstract
The field of reinforcement learning (RL) is concerned with algorithms for learning optimal policies in unknown stochastic environments. Programmatic RL studies representations of policies as programs, meaning involving higher order constructs such as control loops. Despite attracting a lot of attention at the intersection of the machine learning and formal methods communities, very little is known on the theoretical front about programmatic RL: what are good classes of programmatic policies? How large are optimal programmatic policies? How can we learn them? The goal of this paper is to give first answers to these questions, initiating a theoretical study of programmatic RL. Considering a class of gridworld environments, we define a class of programmatic policies. Our main contributions are to place upper bounds on the size of optimal programmatic policies, and to construct an algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics · Scheduling and Optimization Algorithms
