Enhancing the Code Reasoning Capabilities of LLMs via Consistency-based Reinforcement Learning
Zhanyue Qin, Jia Feng, Yibo Lyu, Yun Peng, Dianbo Sui, Cuiyun Gao, and Qing Liao

TL;DR
This paper introduces CodeThinker, a reinforcement learning framework that enhances large language models' code reasoning by leveraging stepwise consistency, resulting in improved accuracy and robustness across benchmarks and downstream tasks.
Contribution
It proposes a novel consistency-driven reinforcement learning approach with a reasoning-aware training module, dynamic sampling, and a reward mechanism to improve code reasoning in LLMs.
Findings
Achieves state-of-the-art accuracy, outperforming baselines by 4.3% on Qwen2.5-Coder-7B-Instruct.
Gains 5.33% and 3.11% accuracy on mathematical and code reasoning tasks across 17 languages.
Effectively alleviates reward hacking and enhances reasoning capabilities.
Abstract
Code reasoning refers to the task of predicting the output of a program given its source code and specific inputs. It can measure the reasoning capability of large language models (LLMs) and also benefit downstream tasks such as code generation and mathematical reasoning. Existing work has verified the effectiveness of reinforcement learning on the task. However, these methods design rewards solely based on final outputs or coarse-grained signals, and neglect the inherent consistency of the stepwise reasoning process in the task. Therefore, these methods often result in sparse reward or reward hacking, which limits the full play of enhanced learning capabilities. To alleviate these issues, we propose CodeThinker, a consistency-driven reinforcement learning framework for code reasoning. Specifically, CodeThinker has three key components: (1) a stepwise reasoning-aware model training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
