PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

Jiawei Li; Xinyue Liang; Junlong Zhang; Yizhe Yang; Chong Feng; Yang Gao

arXiv:2411.11681·cs.AI·May 15, 2025

PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

Jiawei Li, Xinyue Liang, Junlong Zhang, Yizhe Yang, Chong Feng, Yang Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces PSPO*, a new process supervision framework for reasoning tasks in large language models, emphasizing nonlinear reward shaping to improve reasoning accuracy and reduce errors.

Contribution

The paper proposes PSPO*, a novel process supervision paradigm that incorporates nonlinear reward functions based on reasoning steps, enhancing reasoning performance.

Findings

01

PSPO-WRS outperforms existing models on six reasoning datasets.

02

Nonlinear reward shaping improves reasoning accuracy.

03

Considering reasoning steps in reward design benefits model performance.

Abstract

Process supervision enhances the performance of large language models in reasoning tasks by providing feedback at each step of chain-of-thought reasoning. However, due to the lack of effective process supervision methods, even advanced large language models are prone to logical errors and redundant reasoning. We claim that the effectiveness of process supervision significantly depends on both the accuracy and the length of reasoning chains. Moreover, we identify that these factors exhibit a nonlinear relationship with the overall reward score of the reasoning process. Inspired by these insights, we propose a novel process supervision paradigm, PSPO*, which systematically outlines the workflow from reward model training to policy optimization, and highlights the importance of nonlinear rewards in process supervision. Based on PSPO*, we develop the PSPO-WRS, which considers the number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

direct-bit/pspo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis