Loading paper
SSPO: Self-traced Step-wise Preference Optimization for Process Supervision and Reasoning Compression | Tomesphere