Prompt Optimization for LLM Code Generation via Reinforcement Learning

Ali Mohammadi Esfahani; Nafiseh Kahani; Samuel A.Ajila

arXiv:2605.19102·cs.SE·May 20, 2026

Prompt Optimization for LLM Code Generation via Reinforcement Learning

Ali Mohammadi Esfahani, Nafiseh Kahani, Samuel A.Ajila

PDF

TL;DR

This paper introduces a reinforcement learning framework that optimizes prompts for large language models in code generation, significantly improving their accuracy and correctness across multiple benchmarks.

Contribution

The authors develop a novel RL-based prompt refinement method using a hybrid action space and shaped rewards, outperforming existing approaches in code generation tasks.

Findings

01

PPO-based prompt optimization improves Pass@1 scores on MBPP+.

02

The method outperforms EPiC, Reflexion, and Random-Hybrid baselines.

03

Functional correctness in code generation is enhanced through test-driven shaped rewards.

Abstract

Large Language Models (LLMs) can generate code from natural language, but their performance is highly sensitive to prompt formulation. We propose a reinforcement-learning-based framework that models prompt refinement as a sequential decision-making problem. A Proximal Policy Optimization (PPO) agent iteratively improves prompts using a hybrid action space that combines direct generation, genetic lexical mutation and semantic rewriting, guided by shaped rewards derived from unit-test feedback. We evaluate the framework on MBPP+, HumanEval+, and APPS using CodeT5+, CodeLLaMA, and DeepSeek-Coder as frozen code generators. On the 500-task MBPP+ test set, the PPO agent achieves strict Pass@1 scores of 57.58%, 64.80%, and 85.50%, respectively, outperforming EPiC, Reflexion, and Random-Hybrid. Soft-Pass@1 reaches 67.90%, 73.10%, and 88.20%, respectively. Similar improvements are observed on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.