DART: Difficulty-Adaptive Reasoning Truncation for Efficient Large Language Models
Ruofan Zhang, Bin Xia, Zhen Cheng, Cairen Jian, Minglun Yang, Ngai Wong, Yuan Cheng

TL;DR
DART introduces a supervised framework that adaptively truncates reasoning in large language models based on problem difficulty, significantly improving efficiency while maintaining or enhancing accuracy across mathematical benchmarks.
Contribution
It presents a novel difficulty-adaptive reasoning truncation method that learns when to stop thinking, improving efficiency without sacrificing accuracy in LLM reasoning tasks.
Findings
Achieves 81.2% reasoning truncation on GSM8K dataset.
Provides 5.33× computational acceleration.
Maintains or improves reasoning accuracy.
Abstract
Adaptive reasoning is essential for aligning the computational effort of large language models (LLMs) with the intrinsic difficulty of problems. Current chain-of-thought methods boost reasoning ability but indiscriminately generate long explanations, leading to evident inefficiency. However, existing reinforcement learning approaches to adaptive thinking remain unstable and heavily reward-dependent. Here we propose \textbf{DART}, a supervised \textbf{D}ifficulty-\textbf{A}daptive \textbf{R}easoning \textbf{T}runcation framework that adjusts thinking length according to problem difficulty. By distilling concise reasoning patterns from stronger models, interpolating them into a continuum of reasoning styles, and curating optimal training data that balances correctness and compactness, DART learns when to ``stop thinking''. Across multiple mathematical benchmarks, experimental results…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper focuses on an important problem—the inefficiency of CoT. The proposed framework is presented as modular and is conceptually sound.
- Clarity issues. Some parts of the methodology are not clearly explained. For example, it is unclear how the distillation teacher model shortens long reasoning chains and how this process affects the quality of the reasoning paths. Additionally, the paper does not discuss how the quality of the generated reasoning chains is controlled or verified. - Limited novelty. The proposed method essentially involves collecting question-answer pairs with varying reasoning lengths and using this dataset to
- The paper addresses an important problem, improving reasoning efficiency for large language models - The four-step framework (DISTILLING SHORT COTS, interpolation, CREATING A MODEL SPECTRUM, CURATING TRAINING DATA, adaptive training) is clearly structured and easy to follow. - The experiments cover several standard mathematical reasoning benchmarks and include analyses on certain hyperparameters, such as fusion coefficients and sampling density
- Limited novelty. The idea of adaptive, difficulty-aware reasoning is not new, and prior work, such as CoT-Valve, has already explored similar strategies for interpolating model weights and curating adaptive data based on correctness. - The method appears less effective on DeepSeek-R1-Distill-Qwen-7B. On benchmarks such as GSM8K, MATH-500, and OLYMPAID, the generated token length is reduced, but the accuracy also drops. - The short-CoT data generated from DeepSeek-R1-Distill-Qwen-7B is importan
The written is straightforward and easy to understand. The paper proposes an angle to train efficient LRM basing on different difficulty level. The experiments show that the method has some improvements on different models with reduced generation length.
It is not very clear what's the advantage of using the extrapolation to generate different lengths of response regarding different difficulty levels. I understand that the extrapolation could help to control the length of the generation, which can be further used to select and include the data used for the final training. It is not clear how this extrapolation based data generation method work compared with using the prompt based method to generate different lengths of response. Lack of experi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
