RSPO: Risk-Seeking Policy Optimization for Pass@k and Max@k Metrics in Large Language Models
Kaichen Zhang, Shenghao Gao, Yuzhong Hong, Haipeng Sun, Junwei Bao, Hongfei Jiang, Yang Song, Hong Dingqian, Hui Xiong

TL;DR
RSPO introduces a novel training method for large language models that directly optimizes risk-seeking metrics like Pass@k and Max@k, addressing the mismatch between training objectives and evaluation metrics.
Contribution
It proposes RSPO, a new approach that effectively optimizes Pass@k and Max@k metrics by overcoming the hitchhiking problem with unbiased gradient estimators.
Findings
RSPO achieves superior performance on Pass@k and Max@k metrics.
The method provides unbiased gradient estimates despite complex nested gradients.
Theoretical analysis confirms the effectiveness of RSPO in large language models.
Abstract
Current large language model post-training optimizes a risk-neutral objective that maximizes expected reward, yet evaluation relies heavily on risk-seeking metrics like Pass@k (at least one success in k trials) and Max@k (maximum reward across k responses). This mismatch in risk preferences can inevitably lead to suboptimal performance. To bridge this gap, we propose Risk-Seeking Policy Optimization (RSPO), a novel method that directly targets Pass@k and Max@k during training. A key challenge in optimizing these metrics is the "hitchhiking" problem: low-reward responses are inadvertently reinforced if they co-occur with a high-reward response within a sample of k generations, resulting in inefficient optimization. RSPO addresses this problem by leveraging the closed-form probability that a given response is the maximum among k samplings. Despite the complexity of nested gradients over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Multimodal Machine Learning Applications
