RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
Fu-Chieh Chang, Yu-Ting Lee, Hui-Ying Shih, Yi Hsuan Tseng and, Pei-Yuan Wu

TL;DR
This paper develops a theoretical framework to explain how reinforcement learning enhances reasoning capabilities in large language models through the STaR method, addressing the gap between empirical success and theoretical understanding.
Contribution
It provides theoretical criteria for model quality, analyzes policy improvement, convergence conditions, and robustness of STaR in enhancing LLM reasoning.
Findings
Criteria for pre-trained model quality needed for reasoning improvement
Analysis of why reasoning improves iteratively with STaR
Conditions under which STaR converges to optimal reasoning policies
Abstract
The reasoning abilities of large language models (LLMs) have improved with chain-of-thought (CoT) prompting, allowing models to solve complex tasks stepwise. However, training CoT capabilities requires detailed reasoning data, which is often scarce. The self-taught reasoner (STaR) framework addresses this by using reinforcement learning to automatically generate reasoning steps, reducing reliance on human-labeled data. Although STaR and its variants have demonstrated empirical success, a theoretical foundation explaining these improvements is lacking. This work provides a theoretical framework for understanding the effectiveness of reinforcement learning on CoT reasoning and STaR. Our contributions are: (1) criteria for the quality of pre-trained models necessary to initiate effective reasoning improvement; (2) an analysis of policy improvement, showing why LLM reasoning improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Fuzzy Logic and Control Systems
