Interpretable Policies for Reinforcement Learning by Genetic Programming
Daniel Hein, Steffen Udluft, Thomas A. Runkler

TL;DR
This paper introduces GPRL, a genetic programming approach for learning interpretable reinforcement learning policies from existing data, outperforming symbolic regression methods on benchmark tasks.
Contribution
The paper presents a novel GPRL method that autonomously learns simple, interpretable policies from batch data, improving over existing symbolic regression approaches.
Findings
GPRL produces effective, interpretable policies for benchmark tasks.
GPRL outperforms symbolic regression in policy quality.
Interpretable policies are feasible for industrial applications.
Abstract
The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state-action trajectory samples. GPRL is compared to a straight-forward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing well-performing, but non-interpretable policy. Experiments on three reinforcement learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
