Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming

Qianfan Zhang; Tianyu Guo; Xuandi Ren; Jiale Chen; Ming Ding; Ran Xin; Xia Xiao

arXiv:2604.01302·cs.CL·April 3, 2026

Scaling Reasoning Tokens via RL and Parallel Thinking: Evidence From Competitive Programming

Qianfan Zhang, Tianyu Guo, Xuandi Ren, Jiale Chen, Ming Ding, Ran Xin, Xia Xiao

PDF

TL;DR

This paper explores methods to enhance reasoning token efficiency in competitive programming models through reinforcement learning and parallel thinking, achieving significant performance improvements.

Contribution

It introduces a multi-round parallel thinking pipeline and training strategies that effectively scale reasoning tokens, outperforming existing models on challenging problems.

Findings

01

Log-linear relationship between accuracy and reasoning tokens during RL training.

02

Verification RL warmup and randomized clipping improve training trajectories.

03

The full system surpasses GPT-5-high on 456 hard problems with fewer tokens.

Abstract

We study how to scale reasoning token budgets for competitive programming through two complementary approaches: training-time reinforcement learning (RL) and test-time parallel thinking. During RL training, we observe an approximately log-linear relationship between validation accuracy and the average number of generated reasoning tokens over successive checkpoints, and show two ways to shift this training trajectory: verification RL warmup raises the starting point, while randomized clipping produces a steeper trend in the observed regime. As scaling single-generation reasoning during RL quickly becomes expensive under full attention, we introduce a multi-round parallel thinking pipeline that distributes the token budget across threads and rounds of generation, verification, and refinement. We train the model end-to-end on this pipeline to match the training objective to the test-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.