Human-Readable Programs as Actors of Reinforcement Learning Agents Using   Critic-Moderated Evolution

Senne Deproost; Denis Steckelmacher; Ann Now\'e

arXiv:2410.21940·cs.LG·October 30, 2024

Human-Readable Programs as Actors of Reinforcement Learning Agents Using Critic-Moderated Evolution

Senne Deproost, Denis Steckelmacher, Ann Now\'e

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to directly synthesize human-readable programs as reinforcement learning policies during training, improving transparency and efficiency over traditional post-hoc distillation methods.

Contribution

It proposes a novel approach combining TD3 critics with genetic algorithms to learn interpretable programs in real-time during training.

Findings

01

Demonstrates high sample-efficiency in a gridworld environment

02

Shows improved explainability of learned policies

03

Validates the approach's effectiveness and transparency

Abstract

With Deep Reinforcement Learning (DRL) being increasingly considered for the control of real-world systems, the lack of transparency of the neural network at the core of RL becomes a concern. Programmatic Reinforcement Learning (PRL) is able to to create representations of this black-box in the form of source code, not only increasing the explainability of the controller but also allowing for user adaptations. However, these methods focus on distilling a black-box policy into a program and do so after learning using the Mean Squared Error between produced and wanted behaviour, discarding other elements of the RL algorithm. The distilled policy may therefore perform significantly worse than the black-box learned policy. In this paper, we propose to directly learn a program as the policy of an RL agent. We build on TD3 and use its critics as the basis of the objective function of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SenneDeproost/CM-GP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications

MethodsDense Connections · Target Policy Smoothing · Adam · Clipped Double Q-learning · Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Twin Delayed Deep Deterministic · Focus