Pass@k Metric for RLVR: A Diagnostic Tool of Exploration, But Not an Objective

Yang Yu

arXiv:2511.16231·cs.LG·November 21, 2025

Pass@k Metric for RLVR: A Diagnostic Tool of Exploration, But Not an Objective

Yang Yu

PDF

Open Access

TL;DR

This paper critically examines the pass@k metric used in evaluating and optimizing large language models, revealing its limitations as an optimization objective and emphasizing its role as a diagnostic tool rather than a direct goal.

Contribution

The paper provides a theoretical analysis of the pass@k metric, showing it as a reweighted pass@1, and discusses its diminishing learning signal and exploration issues in RL.

Findings

01

pass@k acts as a positive reweighting of pass@1

02

pass@k provides a vanishing learning signal in critical exploration regimes

03

pass@k is more suitable as a diagnostic tool than an optimization objective

Abstract

The ability of Large Language Models (LLMs) to perform complex, multi-step reasoning is a central focus of modern AI research. To evaluate and enhance this capability, the pass@k metric, which measures the probability of obtaining at least one correct solution in k independent samples, has received significant attention. Its intuitive appeal has led to its adoption not only as an evaluation standard but also as a direct optimization objective in reinforcement learning. In this paper, we analyze the pass@k objective, derive its gradient, and demonstrate that it is fundamentally a per-example positive reweighting of the simpler pass@1 objective. Our analysis reveals that the pass@k objective provides a vanishing learning signal in regimes where exploration is most critical. We further analyze the dynamics of "exploration collapse", showing that as the policy concentrates probability mass,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling