Prior Prompt Engineering for Reinforcement Fine-Tuning

Pittawat Taveekitworachai; Potsawee Manakul; Sarana Nutanong; Kunat Pipatanakul

arXiv:2505.14157·cs.CL·September 11, 2025

Prior Prompt Engineering for Reinforcement Fine-Tuning

Pittawat Taveekitworachai, Potsawee Manakul, Sarana Nutanong, Kunat Pipatanakul

PDF

Open Access 1 Video

TL;DR

This paper explores how prior prompt engineering influences reinforcement fine-tuning of language models, demonstrating that different strategies can guide models to internalize specific behaviors and outperform inference-time prompts.

Contribution

It introduces and evaluates five pPE strategies, showing their effectiveness in guiding model behaviors and improving performance over traditional inference-time prompts.

Findings

01

Null-example pPE yields the highest performance gains.

02

All pPE approaches outperform their iPE counterparts.

03

Different pPE strategies induce distinct behavioral styles.

Abstract

This paper investigates prior prompt engineering (pPE) in the context of reinforcement fine-tuning (RFT), where language models (LMs) are incentivized to exhibit behaviors that maximize performance through reward signals. While existing RFT research has primarily focused on algorithms, reward shaping, and data curation, the design of the prior prompt--the instructions prepended to queries during training to elicit behaviors such as step-by-step reasoning--remains underexplored. We investigate whether different pPE approaches can guide LMs to internalize distinct behaviors after RFT. Inspired by inference-time prompt engineering (iPE), we translate five representative iPE strategies--reasoning, planning, code-based reasoning, knowledge recall, and null-example utilization--into corresponding pPE approaches. We experiment with Qwen2.5-7B using each of the pPE approaches, then evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Prior Prompt Engineering for Reinforcement Fine-Tuning· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Reinforcement Learning in Robotics