Hint Tuning: Less Data Makes Better Reasoners
Siqi Fan, Minghao Li, Xiaoqian Ma, Xiusheng Huang, Zhuo Chen, Bowen Qin, Liujie Zhang, Shuo Shang, Weihang Chen

TL;DR
Hint Tuning is a data-efficient method that calibrates reasoning depth in large models, reducing token usage by 24-66% while maintaining accuracy, without extensive data or RL.
Contribution
We introduce Hint Tuning, a novel approach that uses instruct models as difficulty probes to automatically generate training data for better reasoning calibration.
Findings
Achieves 24-66% token reduction across models and scales.
Maintains competitive accuracy on five benchmarks.
Uses only 1K self-annotated samples for training.
Abstract
Large reasoning models achieve high accuracy through extended chain-of-thought but generate 5--8 more tokens than necessary, applying verbose reasoning uniformly regardless of problem difficulty. We propose Hint Tuning, a data-efficient approach that teaches models to calibrate reasoning depth. Our key insight: the corresponding instruct model serves as an ideal difficulty probe. By testing what the instruct model can solve with varying guidance, we automatically construct training data across three states: No-Hint (direct answer), Sparse-Hint (minimal prefix), and Full-Hint (complete reasoning). This converts the abstract challenge of difficulty labeling into a measurable consistency check between the instruct and reasoning models. With only 1K self-annotated samples, Hint Tuning achieves 24--66% token reduction (31.5% average) across mainstream reasoning models (Qwen3-Thinking,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
