Synthesizing Programmatic Policies with Actor-Critic Algorithms and ReLU Networks
Spyros Orfanos, Levi H. S. Lelis

TL;DR
This paper demonstrates that actor-critic algorithms combined with ReLU networks can directly synthesize interpretable, programmatic policies without the need for PIRL-specific algorithms, producing effective and human-readable control policies.
Contribution
It introduces a method to directly generate programmatic policies from ReLU neural networks using actor-critic algorithms, bypassing the need for PIRL-specific search algorithms.
Findings
Translated policies are short and effective.
The approach outperforms PIRL algorithms in several control tasks.
Policies are human-readable with if-then-else and linear structures.
Abstract
Programmatically Interpretable Reinforcement Learning (PIRL) encodes policies in human-readable computer programs. Novel algorithms were recently introduced with the goal of handling the lack of gradient signal to guide the search in the space of programmatic policies. Most of such PIRL algorithms first train a neural policy that is used as an oracle to guide the search in the programmatic space. In this paper, we show that such PIRL-specific algorithms are not needed, depending on the language used to encode the programmatic policies. This is because one can use actor-critic algorithms to directly obtain a programmatic policy. We use a connection between ReLU neural networks and oblique decision trees to translate the policy learned with actor-critic algorithms into programmatic policies. This translation from ReLU networks allows us to synthesize policies encoded in programs with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics
MethodsJigsaw · PIRL
