Reasoning Steps as Curriculum: Using Depth of Thought as a Difficulty Signal for Tuning LLMs
Jeesu Jung, Sangkeun Jung

TL;DR
This paper introduces a new difficulty signal for curriculum learning in training large language models, based on the depth of thought required in reasoning steps, aiming to improve reasoning performance.
Contribution
It proposes using depth of thought as a scalable, interpretable difficulty measure for curriculum learning, validated through hypotheses and an evaluation framework.
Findings
DoT correlates with reasoning difficulty benchmarks
Curricula ordered by DoT outperform length-based ones
Difficulty measure is robust across different teacher models
Abstract
Curriculum learning for training LLMs requires a difficulty signal that aligns with reasoning while remaining scalable and interpretable. We propose a simple premise: tasks that demand deeper depth of thought for humans should also be harder for models. Accordingly, we define difficulty as depth of thought (DoT) and operationalize it by counting the discrete steps in a teacher model's reasoning trace (e.g., Chain-of-Thought). We then train with a shallow to deep curriculum ordered by this DoT and outline how to derive, validate, and schedule it at scale. Our position yields three testable hypotheses: (i) DoT correlates with conventional difficulty on reasoning benchmarks, (ii) DoT-ordered curricula outperform length- or judge-scored curricula under matched budgets, and (iii) the difficulty is robust across teacher models given light formatting controls. We propose an evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
