Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search
Max Liu, Chan-Hung Yu, Wei-Hsu Lee, Cheng-Wei Hung, Yen-Chun Chen,, Shao-Hua Sun

TL;DR
This paper introduces LLM-GS, a framework that uses large language models to guide search in programmatic reinforcement learning, significantly improving sample efficiency and enabling non-programmers to generate effective policies from natural language descriptions.
Contribution
The paper proposes a novel LLM-guided search framework with a Pythonic-DSL strategy and Scheduled Hill Climbing algorithm to enhance program synthesis in reinforcement learning.
Findings
LLM-GS outperforms existing PRL methods in the Karel domain.
The Pythonic-DSL strategy improves program correctness and efficiency.
LLM-GS enables non-programmers to generate policies from natural language.
Abstract
Programmatic reinforcement learning (PRL) has been explored for representing policies through programs as a means to achieve interpretability and generalization. Despite promising outcomes, current state-of-the-art PRL methods are hindered by sample inefficiency, necessitating tens of millions of program-environment interactions. To tackle this challenge, we introduce a novel LLM-guided search framework (LLM-GS). Our key insight is to leverage the programming expertise and common sense reasoning of LLMs to enhance the efficiency of assumption-free, random-guessing search methods. We address the challenge of LLMs' inability to generate precise and grammatically correct programs in domain-specific languages (DSLs) by proposing a Pythonic-DSL strategy - an LLM is instructed to initially generate Python codes and then convert them into DSL programs. To further optimize the LLM-generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Multi-Agent Systems and Negotiation · Software Engineering Research
