Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains
Soumya Rani Samineni, Durgesh Kalwar, Vardaan Gangal, Siddhant Bhambri, Subbarao Kambhampati

TL;DR
This paper investigates how reinforcement learning post-training affects the reasoning process of large language models in math tasks, revealing that it improves local coherence but not necessarily the correctness of solutions.
Contribution
The study introduces a trace coherence measure based on First-Order Logic to analyze the effects of RL post-training on reasoning traces in LLMs.
Findings
RL post-training improves trace coherence, especially on challenging problems.
Enhanced local coherence does not always lead to valid or correct solutions.
Claims of improved reasoning should consider the distinction between coherence and validity.
Abstract
Reinforcement Learning with Verifiable Rewards (RLVR)-based post-training of Large Language Models (LLMs) has been shown to improve accuracy on reasoning tasks and continues to attract significant attention. Existing RLVR methods, however, typically treat all tokens uniformly without accounting for token-level advantages. These methods primarily evaluate performance based on final answer correctness or Pass@K accuracy, and yet make claims about RL post-training leading to improved reasoning traces. This motivates our investigation into the effect of RL post-training on intermediate tokens which are not directly incentivized. To study this, we design an experimental setup using the GRPO algorithm with Qwen-2.5-0.5B model on the GSM8K dataset. We introduce trace coherence, a First-Order Logic (FOL)-based measure to capture the consistency of reasoning steps by identifying errors in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
