PRL: Prompts from Reinforcement Learning

Pawe{\l} Batorski; Adrian Kosmala; Paul Swoboda

arXiv:2505.14412·cs.AI·April 21, 2026

PRL: Prompts from Reinforcement Learning

Pawe{\l} Batorski, Adrian Kosmala, Paul Swoboda

PDF

1 Repo

TL;DR

PRL introduces a reinforcement learning-based method for automatic prompt generation that outperforms existing approaches across multiple NLP benchmarks, reducing reliance on expert-crafted prompts.

Contribution

The paper presents a novel RL-based approach for generating effective prompts, capable of producing unseen few-shot examples and achieving state-of-the-art results.

Findings

01

Surpasses prior methods by 2.58% on classification accuracy

02

Improves ROUGE scores by 4.32 on summarization

03

Enhances SARI scores by 6.93 on simplification

Abstract

Effective prompt engineering remains a central challenge in fully harnessing the capabilities of LLMs. While well-designed prompts can dramatically enhance performance, crafting them typically demands expert intuition and a nuanced understanding of the task. Moreover, the most impactful prompts often hinge on subtle semantic cues, ones that may elude human perception but are crucial for guiding LLM behavior. In this paper, we introduce PRL (Prompts from Reinforcement Learning), a novel RL-based approach for automatic prompt generation. Unlike previous methods, PRL can produce novel few-shot examples that were not seen during training. Our approach achieves state-of-the-art performance across a range of benchmarks, including text classification, simplification, and summarization. On the classification task, it surpasses prior methods by 2.58% over APE and 1.00% over EvoPrompt.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Batorskq/prl
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.