STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes

Chenjun Xu; Zhennan Zhou; Zhan Su; Bill Howe; Lucy Lu Wang; Bingbing Wen

arXiv:2605.13165·cs.CL·May 14, 2026

STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes

Chenjun Xu, Zhennan Zhou, Zhan Su, Bill Howe, Lucy Lu Wang, Bingbing Wen

PDF

TL;DR

This paper introduces STOP, a structured on-policy pruning method that reduces reasoning trace length and inference cost in low-data regimes, maintaining accuracy and improving reasoning efficiency.

Contribution

We propose STOP, an on-policy pruning algorithm that constructs structured reasoning interfaces and retains minimal reasoning traces, improving efficiency without sacrificing accuracy.

Findings

01

STOP reduces generated tokens by up to 42.4%.

02

It largely preserves accuracy in low-data fine-tuning.

03

STOP induces smaller distributional shift than teacher-guided pruning.

Abstract

Long chain-of-thought (Long CoT) reasoning improves performance on multi-step problems, but it also induces overthinking: models often generate low-yield reasoning that increases inference cost and latency. This inefficiency is especially problematic in low-data fine-tuning regimes, where real applications adapt reasoning models with limited supervision and cannot rely on large-scale teacher distillation or heavy test-time control. To address this, we propose STOP (Structured On-policy Pruning), an on-policy algorithm for analyzing and pruning long-form reasoning traces. STOP constructs self-distilled traces from the model. Then it maps each trace into a structured reasoning interface through node segmentation, taxonomy annotation, and reasoning-tree construction. On top of this interface, we introduce ECN (Earliest Correct Node), which retains the shortest prefix ending at the earliest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.