Synthesizing Programmatic Reinforcement Learning Policies with Large   Language Model Guided Search

Max Liu; Chan-Hung Yu; Wei-Hsu Lee; Cheng-Wei Hung; Yen-Chun Chen,; Shao-Hua Sun

arXiv:2405.16450·cs.LG·March 12, 2025

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

Max Liu, Chan-Hung Yu, Wei-Hsu Lee, Cheng-Wei Hung, Yen-Chun Chen,, Shao-Hua Sun

PDF

Open Access 1 Video

TL;DR

This paper introduces LLM-GS, a framework that uses large language models to guide search in programmatic reinforcement learning, significantly improving sample efficiency and enabling non-programmers to generate effective policies from natural language descriptions.

Contribution

The paper proposes a novel LLM-guided search framework with a Pythonic-DSL strategy and Scheduled Hill Climbing algorithm to enhance program synthesis in reinforcement learning.

Findings

01

LLM-GS outperforms existing PRL methods in the Karel domain.

02

The Pythonic-DSL strategy improves program correctness and efficiency.

03

LLM-GS enables non-programmers to generate policies from natural language.

Abstract

Programmatic reinforcement learning (PRL) has been explored for representing policies through programs as a means to achieve interpretability and generalization. Despite promising outcomes, current state-of-the-art PRL methods are hindered by sample inefficiency, necessitating tens of millions of program-environment interactions. To tackle this challenge, we introduce a novel LLM-guided search framework (LLM-GS). Our key insight is to leverage the programming expertise and common sense reasoning of LLMs to enhance the efficiency of assumption-free, random-guessing search methods. We address the challenge of LLMs' inability to generate precise and grammatically correct programs in domain-specific languages (DSLs) by proposing a Pythonic-DSL strategy - an LLM is instructed to initially generate Python codes and then convert them into DSL programs. To further optimize the LLM-generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Multi-Agent Systems and Negotiation · Software Engineering Research