Behavior Injection: Preparing Language Models for Reinforcement Learning

Zhepeng Cen; Yihang Yao; William Han; Zuxin Liu; Ding Zhao

arXiv:2505.18917·cs.LG·October 7, 2025

Behavior Injection: Preparing Language Models for Reinforcement Learning

Zhepeng Cen, Yihang Yao, William Han, Zuxin Liu, Ding Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces behavior injection, a data augmentation method that enhances language models' readiness for reinforcement learning, leading to improved performance gains on reasoning benchmarks.

Contribution

The paper identifies key conditions for effective RL finetuning of LLMs and proposes behavior injection to improve RL outcomes through task-agnostic data augmentation.

Findings

01

Behavior injection improves RL performance gains.

02

Analysis reveals importance of rollout accuracy and data co-influence.

03

Method is effective across multiple models and benchmarks.

Abstract

Reinforcement learning (RL) has emerged as a powerful post-training technique to incentivize the reasoning ability of large language models (LLMs). However, LLMs can respond very inconsistently to RL finetuning: some show substantial performance gains, while others plateau or even degrade. To understand this divergence, we analyze the per-step influence of the RL objective and identify two key conditions for effective post-training: (1) RL-informative rollout accuracy, and (2) strong data co-influence, which quantifies how much the training data affects performance on other samples. Guided by these insights, we propose behavior injection, a task-agnostic data augmentation scheme applied prior to RL. Behavior injection enriches the supervised finetuning (SFT) data by seeding exploratory and exploitative behaviors, effectively making the model more RL-ready. We evaluate our method across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

czp16/bridge-llm-reasoning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Software Engineering Research · Reinforcement Learning in Robotics

MethodsBalanced Selection