CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation
Zhongyuan Peng, Caijun Xu, Changyi Xiao, Shibo Hong, Eli Zhang, Stephen Huang, Yixin Cao

TL;DR
This paper introduces CoDiQ, a framework for controllable, test-time scaling of question difficulty in large reasoning models, enabling the generation of challenging, solvable questions that improve model reasoning performance.
Contribution
The paper presents CoDiQ, a novel method for fine-grained difficulty control in question generation, along with a large corpus and generator that enhance reasoning model training.
Findings
Generated questions are more challenging than existing benchmarks.
Training models on CoDiQ-Corpus improves reasoning performance.
CoDiQ enables scalable, controllable question difficulty at test time.
Abstract
Large Reasoning Models (LRMs) benefit substantially from training on challenging competition-level questions. However, existing automated question synthesis methods lack precise difficulty control, incur high computational costs, and struggle to generate competition-level questions at scale. In this paper, we propose CoDiQ (Controllable Difficult Question Generation), a novel framework enabling fine-grained difficulty control via test-time scaling while ensuring question solvability. Specifically, first, we identify a test-time scaling tendency (extended reasoning token budget boosts difficulty but reduces solvability) and the intrinsic properties defining the upper bound of a model's ability to generate valid, high-difficulty questions. Then, we develop CoDiQ-Generator from Qwen3-8B, which improves the upper bound of difficult question generation, making it particularly well-suited for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
