Hint Tuning: Less Data Makes Better Reasoners

Siqi Fan; Minghao Li; Xiaoqian Ma; Xiusheng Huang; Zhuo Chen; Bowen Qin; Liujie Zhang; Shuo Shang; Weihang Chen

arXiv:2605.08665·cs.CL·May 12, 2026

Hint Tuning: Less Data Makes Better Reasoners

Siqi Fan, Minghao Li, Xiaoqian Ma, Xiusheng Huang, Zhuo Chen, Bowen Qin, Liujie Zhang, Shuo Shang, Weihang Chen

PDF

TL;DR

Hint Tuning is a data-efficient method that calibrates reasoning depth in large models, reducing token usage by 24-66% while maintaining accuracy, without extensive data or RL.

Contribution

We introduce Hint Tuning, a novel approach that uses instruct models as difficulty probes to automatically generate training data for better reasoning calibration.

Findings

01

Achieves 24-66% token reduction across models and scales.

02

Maintains competitive accuracy on five benchmarks.

03

Uses only 1K self-annotated samples for training.

Abstract

Large reasoning models achieve high accuracy through extended chain-of-thought but generate 5--8 more tokens than necessary, applying verbose reasoning uniformly regardless of problem difficulty. We propose Hint Tuning, a data-efficient approach that teaches models to calibrate reasoning depth. Our key insight: the corresponding instruct model serves as an ideal difficulty probe. By testing what the instruct model can solve with varying guidance, we automatically construct training data across three states: No-Hint (direct answer), Sparse-Hint (minimal prefix), and Full-Hint (complete reasoning). This converts the abstract challenge of difficulty labeling into a measurable consistency check between the instruct and reasoning models. With only 1K self-annotated samples, Hint Tuning achieves 24--66% token reduction (31.5% average) across mainstream reasoning models (Qwen3-Thinking,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.