TL;DR
CRISP is a self-distillation method that trains models to reason more concisely, reducing token usage significantly while improving accuracy across various tasks and models.
Contribution
It introduces a simple yet effective self-distillation approach that automatically compresses reasoning, improving efficiency and accuracy without ground-truth answers or token constraints.
Findings
Achieves 57-59% token reduction on math datasets with 9-16 point accuracy improvements.
Gains 10 points on AIME 2024 with 41% compression.
Generalizes across model families and transfers to multi-step planning tasks.
Abstract
Reasoning models think out loud, but much of what they say is noise. We introduce CRISP (Compressed Reasoning via Iterative Self-Policy Distillation), a method that teaches models to reason more concisely by distilling their own concise behavior back into themselves. The entire approach reduces to one idea: condition the same model on a ''be concise'' instruction to obtain teacher logits, and minimize per-token reverse KL on the student's own rollouts. No ground-truth answers, no token budgets, no difficulty estimators. Just self-distillation. Yet this simplicity belies surprising sophistication: CRISP automatically compresses easy problems aggressively while preserving the deliberation needed for hard ones. On Qwen3-8B and Qwen3-14B, we achieve 57--59% token reduction on MATH-500 while improving accuracy by 9--16 points absolute. On AIME 2024, the 14B model gains 10 points with 41%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
