CRISP: Compressed Reasoning via Iterative Self-Policy Distillation

Hejian Sang; Yuanda Xu; Zhengze Zhou; Ran He; Zhipeng Wang; Jiachen Sun

arXiv:2603.05433·cs.LG·April 14, 2026

CRISP: Compressed Reasoning via Iterative Self-Policy Distillation

Hejian Sang, Yuanda Xu, Zhengze Zhou, Ran He, Zhipeng Wang, Jiachen Sun

PDF

1 Repo

TL;DR

CRISP is a self-distillation method that trains models to reason more concisely, reducing token usage significantly while improving accuracy across various tasks and models.

Contribution

It introduces a simple yet effective self-distillation approach that automatically compresses reasoning, improving efficiency and accuracy without ground-truth answers or token constraints.

Findings

01

Achieves 57-59% token reduction on math datasets with 9-16 point accuracy improvements.

02

Gains 10 points on AIME 2024 with 41% compression.

03

Generalizes across model families and transfers to multi-step planning tasks.

Abstract

Reasoning models think out loud, but much of what they say is noise. We introduce CRISP (Compressed Reasoning via Iterative Self-Policy Distillation), a method that teaches models to reason more concisely by distilling their own concise behavior back into themselves. The entire approach reduces to one idea: condition the same model on a ''be concise'' instruction to obtain teacher logits, and minimize per-token reverse KL on the student's own rollouts. No ground-truth answers, no token budgets, no difficulty estimators. Just self-distillation. Yet this simplicity belies surprising sophistication: CRISP automatically compresses easy problems aggressively while preserving the deliberation needed for hard ones. On Qwen3-8B and Qwen3-14B, we achieve 57--59% token reduction on MATH-500 while improving accuracy by 9--16 points absolute. On AIME 2024, the 14B model gains 10 points with 41%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HJSang/OPSD_Reasoning_Compression
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.