Behavior Injection: Preparing Language Models for Reinforcement Learning
Zhepeng Cen, Yihang Yao, William Han, Zuxin Liu, Ding Zhao

TL;DR
This paper introduces behavior injection, a data augmentation method that enhances language models' readiness for reinforcement learning, leading to improved performance gains on reasoning benchmarks.
Contribution
The paper identifies key conditions for effective RL finetuning of LLMs and proposes behavior injection to improve RL outcomes through task-agnostic data augmentation.
Findings
Behavior injection improves RL performance gains.
Analysis reveals importance of rollout accuracy and data co-influence.
Method is effective across multiple models and benchmarks.
Abstract
Reinforcement learning (RL) has emerged as a powerful post-training technique to incentivize the reasoning ability of large language models (LLMs). However, LLMs can respond very inconsistently to RL finetuning: some show substantial performance gains, while others plateau or even degrade. To understand this divergence, we analyze the per-step influence of the RL objective and identify two key conditions for effective post-training: (1) RL-informative rollout accuracy, and (2) strong data co-influence, which quantifies how much the training data affects performance on other samples. Guided by these insights, we propose behavior injection, a task-agnostic data augmentation scheme applied prior to RL. Behavior injection enriches the supervised finetuning (SFT) data by seeding exploratory and exploitative behaviors, effectively making the model more RL-ready. We evaluate our method across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Software Engineering Research · Reinforcement Learning in Robotics
MethodsBalanced Selection
