SSPO: Self-traced Step-wise Preference Optimization for Process Supervision and Reasoning Compression
Yuyang Xu, Yi Cheng, Haochao Ying, Zhuoyun Du, Renjun Hu, Xing Shi, Wei Lin, Jian Wu

TL;DR
SSPO is a novel reinforcement learning framework that improves large language model reasoning by optimizing each step using self-generated preferences, reducing overthinking and enhancing reasoning accuracy and conciseness.
Contribution
It introduces a pluggable, self-supervised RL method that optimizes reasoning steps without auxiliary models or manual annotations, improving efficiency and reasoning quality.
Findings
Reasoning sequences are more accurate and succinct with SSPO.
SSPO effectively reduces overthinking behaviors across multiple domains.
Model performance remains uncompromised while reasoning is compressed.
Abstract
Test-time scaling has proven effective in further enhancing the performance of pretrained Large Language Models (LLMs). However, mainstream post-training methods (i.e., reinforcement learning (RL) with chain-of-thought (CoT) reasoning) often incur substantial computational overhead due to auxiliary models and overthinking. In this paper, we empirically reveal that the incorrect answers partially stem from verbose reasoning processes lacking correct self-fix, where errors accumulate across multiple reasoning steps. To this end, we propose Self-traced Step-wise Preference Optimization (SSPO), a pluggable RL process supervision framework that enables fine-grained optimization of each reasoning step. Specifically, SSPO requires neither auxiliary models nor stepwise manual annotations. Instead, it leverages step-wise preference signals generated by the model itself to guide the optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Semantic Web and Ontologies · Service-Oriented Architecture and Web Services
