Pass@k Metric for RLVR: A Diagnostic Tool of Exploration, But Not an Objective
Yang Yu

TL;DR
This paper critically examines the pass@k metric used in evaluating and optimizing large language models, revealing its limitations as an optimization objective and emphasizing its role as a diagnostic tool rather than a direct goal.
Contribution
The paper provides a theoretical analysis of the pass@k metric, showing it as a reweighted pass@1, and discusses its diminishing learning signal and exploration issues in RL.
Findings
pass@k acts as a positive reweighting of pass@1
pass@k provides a vanishing learning signal in critical exploration regimes
pass@k is more suitable as a diagnostic tool than an optimization objective
Abstract
The ability of Large Language Models (LLMs) to perform complex, multi-step reasoning is a central focus of modern AI research. To evaluate and enhance this capability, the pass@k metric, which measures the probability of obtaining at least one correct solution in k independent samples, has received significant attention. Its intuitive appeal has led to its adoption not only as an evaluation standard but also as a direct optimization objective in reinforcement learning. In this paper, we analyze the pass@k objective, derive its gradient, and demonstrate that it is fundamentally a per-example positive reweighting of the simpler pass@1 objective. Our analysis reveals that the pass@k objective provides a vanishing learning signal in regimes where exploration is most critical. We further analyze the dynamics of "exploration collapse", showing that as the policy concentrates probability mass,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling
