StaRPO: Stability-Augmented Reinforcement Policy Optimization
Jinghan Zhang, Fengran Mo, Tharindu Cyril Weerasooriya, Ruimin Dai, Xiaoyan Han, Yanjie Fu, Dakuo Wang, Kunpeng Liu

TL;DR
StaRPO introduces a stability-focused reinforcement learning framework for large language models, improving logical consistency and reasoning quality by explicitly incorporating stability metrics into the training process.
Contribution
The paper proposes StaRPO, a novel RL framework that uses explicit stability metrics to enhance reasoning stability and accuracy in language models.
Findings
StaRPO's stability metrics correlate with logic errors.
It outperforms baselines on four reasoning benchmarks.
It improves both accuracy and logical stability.
Abstract
Reinforcement learning (RL) is effective in enhancing the accuracy of large language models in complex reasoning tasks. Existing RL policy optimization frameworks rely on final-answer correctness as feedback signals and rarely capture the internal logical structure of the reasoning process. Consequently, the models would generate fluent and semantically relevant responses but logically inconsistent, structurally erratic, or redundant. To this end, we propose StaRPO, a stability-augmented reinforcement learning framework that explicitly incorporates reasoning stability into the optimization objective. Our StaRPO decomposes stability into two computable lightweight metrics: the Autocorrelation Function (ACF) to evaluate local step-to-step coherence, and Path Efficiency (PE) to evaluate global goal-directedness of the reasoning trajectory. These stability rewards are combined with task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
