Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
Jiayu Wang, Yifei Ming, Zixuan Ke, Caiming Xiong, Shafiq Joty, Aws Albarghouthi, Frederic Sala

TL;DR
This paper introduces SPARKLE, a detailed framework to analyze how reinforcement learning improves language models' reasoning, revealing that RL enhances internal strategy formulation and knowledge integration rather than external plan execution.
Contribution
The paper presents SPARKLE, a novel analytic framework for dissecting RL effects on language models' reasoning, and proposes SparkleRL-PSS for training with hard problems using partial scaffolding.
Findings
RL models are more robust to explicit plan degradation.
RL improves models' ability to integrate knowledge.
Hard problems with partial scaffolding can be effectively reused for training.
Abstract
Reinforcement learning (RL) has become the dominant paradigm for improving the performance of language models on complex reasoning tasks. Despite the substantial empirical gains demonstrated by RL-based training methods like GRPO, a granular understanding of why and how RL enhances performance is still lacking. To bridge this gap, we introduce SPARKLE, a fine-grained analytic framework to dissect the effects of RL across three key dimensions: (1) plan following and execution, (2) knowledge integration, and (3) chain of subproblems. Using this framework, we gain insights beyond mere accuracy. For instance, providing models with explicit human-crafted, step-by-step plans can surprisingly degrade performance on the most challenging benchmarks, yet RL-tuned models exhibit greater robustness, experiencing markedly smaller performance drops than base or SFT models. This suggests that RL may…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Reinforcement Learning in Robotics
MethodsBalanced Selection
