Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs

Yifan Zhou; Sachin Grover; Mohamed El Mistiri; Kamalesh Kalirathnam; Pratyush Kerhalkar; Swaroop Mishra; Neelesh Kumar; Sanket Gaurav; Oya Aran; Heni Ben Amor

arXiv:2511.21928·cs.LG·December 1, 2025

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs

Yifan Zhou, Sachin Grover, Mohamed El Mistiri, Kamalesh Kalirathnam, Pratyush Kerhalkar, Swaroop Mishra, Neelesh Kumar, Sanket Gaurav, Oya Aran, Heni Ben Amor

PDF

Open Access 1 Video

TL;DR

Prompted Policy Search (ProPS) introduces a novel reinforcement learning framework that integrates linguistic and numerical reasoning via large language models, enhancing sample efficiency and generalization across diverse tasks.

Contribution

ProPS uniquely centers LLMs in the policy optimization process, enabling direct policy updates through combined semantic and numerical feedback, advancing RL capabilities.

Findings

01

Outperforms baseline algorithms on 8 of 15 tasks

02

Incorporating domain knowledge improves learning efficiency

03

LLMs can perform in-context numerical optimization

Abstract

Reinforcement Learning (RL) traditionally relies on scalar reward signals, limiting its ability to leverage the rich semantic knowledge often available in real-world tasks. In contrast, humans learn efficiently by combining numerical feedback with language, prior knowledge, and common sense. We introduce Prompted Policy Search (ProPS), a novel RL method that unifies numerical and linguistic reasoning within a single framework. Unlike prior work that augment existing RL components with language, ProPS places a large language model (LLM) at the center of the policy optimization loop-directly proposing policy updates based on both reward feedback and natural language input. We show that LLMs can perform numerical optimization in-context, and that incorporating semantic signals, such as goals, domain knowledge, and strategy hints can lead to more informed exploration and sample-efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Robot Manipulation and Learning