CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation

Zhongyuan Peng; Caijun Xu; Changyi Xiao; Shibo Hong; Eli Zhang; Stephen Huang; Yixin Cao

arXiv:2602.01660·cs.CL·February 3, 2026

CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation

Zhongyuan Peng, Caijun Xu, Changyi Xiao, Shibo Hong, Eli Zhang, Stephen Huang, Yixin Cao

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper introduces CoDiQ, a framework for controllable, test-time scaling of question difficulty in large reasoning models, enabling the generation of challenging, solvable questions that improve model reasoning performance.

Contribution

The paper presents CoDiQ, a novel method for fine-grained difficulty control in question generation, along with a large corpus and generator that enhance reasoning model training.

Findings

01

Generated questions are more challenging than existing benchmarks.

02

Training models on CoDiQ-Corpus improves reasoning performance.

03

CoDiQ enables scalable, controllable question difficulty at test time.

Abstract

Large Reasoning Models (LRMs) benefit substantially from training on challenging competition-level questions. However, existing automated question synthesis methods lack precise difficulty control, incur high computational costs, and struggle to generate competition-level questions at scale. In this paper, we propose CoDiQ (Controllable Difficult Question Generation), a novel framework enabling fine-grained difficulty control via test-time scaling while ensuring question solvability. Specifically, first, we identify a test-time scaling tendency (extended reasoning token budget boosts difficulty but reduces solvability) and the intrinsic properties defining the upper bound of a model's ability to generate valid, high-difficulty questions. Then, we develop CoDiQ-Generator from Qwen3-8B, which improves the upper bound of difficult question generation, making it particularly well-suited for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
AleXGroup/CoDiQ-Gen-8B
model· 1 dl
1 dl

Datasets

AleXGroup/CoDiQ-Corpus
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning