Learning Dynamics in RL Post-Training for Language Models
Akiyoshi Tomihari

TL;DR
This paper analyzes the learning dynamics of reinforcement learning post-training in language models using an NTK framework, revealing how limited feature variability increases model confidence and reduces output diversity, and proposes a new classifier-first training strategy.
Contribution
It introduces an NTK-based analysis of RL post-training dynamics, explaining confidence increase and output diversity reduction, and proposes the CF-RL method to improve training efficiency.
Findings
RL updates increase model confidence due to limited feature variability
CF-RL accelerates training and enhances model confidence
The mechanism of CF-RL differs from supervised linear probing
Abstract
Reinforcement learning (RL) post-training is a critical stage in modern language model development, playing a key role in improving alignment and reasoning ability. However, several phenomena remain poorly understood, including the reduction in output diversity. To gain a broader understanding of RL post-training, we analyze the learning dynamics of RL post-training from a perspective that has been studied in supervised learning but remains underexplored in RL. We adopt an empirical neural tangent kernel (NTK) framework and decompose the NTK into two components to characterize how RL updates propagate across training samples. Our analysis reveals that limited variability in feature representations can cause RL updates to systematically increase model confidence, providing an explanation for the commonly observed reduction in output diversity after RL post-training. Furthermore, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
