Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs
Yifan Zhou, Sachin Grover, Mohamed El Mistiri, Kamalesh Kalirathnam, Pratyush Kerhalkar, Swaroop Mishra, Neelesh Kumar, Sanket Gaurav, Oya Aran, Heni Ben Amor

TL;DR
Prompted Policy Search (ProPS) introduces a novel reinforcement learning framework that integrates linguistic and numerical reasoning via large language models, enhancing sample efficiency and generalization across diverse tasks.
Contribution
ProPS uniquely centers LLMs in the policy optimization process, enabling direct policy updates through combined semantic and numerical feedback, advancing RL capabilities.
Findings
Outperforms baseline algorithms on 8 of 15 tasks
Incorporating domain knowledge improves learning efficiency
LLMs can perform in-context numerical optimization
Abstract
Reinforcement Learning (RL) traditionally relies on scalar reward signals, limiting its ability to leverage the rich semantic knowledge often available in real-world tasks. In contrast, humans learn efficiently by combining numerical feedback with language, prior knowledge, and common sense. We introduce Prompted Policy Search (ProPS), a novel RL method that unifies numerical and linguistic reasoning within a single framework. Unlike prior work that augment existing RL components with language, ProPS places a large language model (LLM) at the center of the policy optimization loop-directly proposing policy updates based on both reward feedback and natural language input. We show that LLMs can perform numerical optimization in-context, and that incorporating semantic signals, such as goals, domain knowledge, and strategy hints can lead to more informed exploration and sample-efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Robot Manipulation and Learning
