Control-R: Towards controllable test-time scaling

Di Zhang; Weida Wang; Junxian Li; Xunzhi Wang; Jiatong Li; Jianbo Wu; Jingdi Lei; Haonan He; Peng Ye; Shufei Zhang; Wanli Ouyang; Yuqiang Li; and Dongzhan Zhou

arXiv:2506.00189·cs.AI·June 3, 2025

Control-R: Towards controllable test-time scaling

Di Zhang, Weida Wang, Junxian Li, Xunzhi Wang, Jiatong Li, Jianbo Wu, Jingdi Lei, Haonan He, Peng Ye, Shufei Zhang, Wanli Ouyang, Yuqiang Li, and Dongzhan Zhou

PDF

Open Access

TL;DR

This paper introduces Control-R, a novel approach for test-time controllable reasoning in large models, using structured control signals and a new dataset to improve complex problem-solving performance.

Contribution

The paper presents Reasoning Control Fields (RCF) and Control-R-4K dataset, enabling test-time adjustment of reasoning effort in large models, with a new finetuning method for better control.

Findings

01

Achieves state-of-the-art results on AIME2024 and MATH500 benchmarks.

02

Enables controllable reasoning process during inference.

03

Improves reasoning efficiency and accuracy with test-time control.

Abstract

This paper target in addressing the challenges of underthinking and overthinking in long chain-of-thought (CoT) reasoning for Large Reasoning Models (LRMs) by introducing Reasoning Control Fields (RCF)--a novel test-time approach that injects structured control signals to guide reasoning from a tree search perspective. RCF enables models to adjust reasoning effort according to given control conditions when solving complex tasks. Additionally, we present the Control-R-4K dataset, which consists of challenging problems annotated with detailed reasoning processes and corresponding control fields. To further enhance reasoning control, we propose a Conditional Distillation Finetuning (CDF) method, which trains model--particularly Control-R-32B--to effectively adjust reasoning effort during test time. Experimental results on benchmarks such as AIME2024 and MATH500 demonstrate that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Neural Networks and Applications