Programmatic Policy Extraction by Iterative Local Search

Rasmus Larsen; Mikkel N{\o}rgaard Schmidt

arXiv:2201.06863·cs.AI·January 19, 2022

Programmatic Policy Extraction by Iterative Local Search

Rasmus Larsen, Mikkel N{\o}rgaard Schmidt

PDF

TL;DR

This paper introduces a straightforward method for extracting interpretable, programmatic policies from neural network policies using iterative local search, demonstrated on control tasks like pendulum swing-up.

Contribution

The paper presents a novel local search heuristic for directly extracting simple, interpretable policies from pretrained neural networks, improving interpretability and verification.

Findings

01

Extracted policies are simple and interpretable.

02

Performance is close to the original neural policies.

03

Applicable to policies trained by imitation or neural methods.

Abstract

Reinforcement learning policies are often represented by neural networks, but programmatic policies are preferred in some cases because they are more interpretable, amenable to formal verification, or generalize better. While efficient algorithms for learning neural policies exist, learning programmatic policies is challenging. Combining imitation-projection and dataset aggregation with a local search heuristic, we present a simple and direct approach to extracting a programmatic policy from a pretrained neural policy. After examining our local search heuristic on a programming by example problem, we demonstrate our programmatic policy extraction method on a pendulum swing-up problem. Both when trained using a hand crafted expert policy and a learned neural policy, our method discovers simple and interpretable policies that perform almost as well as the original.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.