$V_0$: A Generalist Value Model for Any Policy at State Zero

Yi-Kai Zhang; Zhiyuan Yao; Hongyan Hao; Yueqing Sun; Qi Gu; Hui Su; Xunliang Cai; De-Chuan Zhan; Han-Jia Ye

arXiv:2602.03584·cs.CL·April 1, 2026

$V_0$: A Generalist Value Model for Any Policy at State Zero

Yi-Kai Zhang, Zhiyuan Yao, Hongyan Hao, Yueqing Sun, Qi Gu, Hui Su, Xunliang Cai, De-Chuan Zhan, Han-Jia Ye

PDF

1 Models

TL;DR

This paper introduces $V_0$, a versatile value model that predicts LLM performance on unseen prompts without retraining, improving efficiency in model training and deployment.

Contribution

It proposes a novel context-based value estimation approach at State Zero, eliminating the need for parameter updates and enabling effective LLM routing.

Findings

01

$V_0$ outperforms heuristic budget allocation methods.

02

Achieves Pareto-optimal trade-off between performance and cost.

03

Enables efficient model routing without frequent retraining.

Abstract

Policy gradient methods rely on a baseline to measure the relative advantage of an action, ensuring the model reinforces behaviors that outperform its current average capability. In the training of Large Language Models (LLMs) using Actor-Critic methods (e.g., PPO), this baseline is typically estimated by a Value Model (Critic) often as large as the policy model itself. However, as the policy continuously evolves, the value model requires expensive, synchronous incremental training to accurately track the shifting capabilities of the policy. To avoid this overhead, Group Relative Policy Optimization (GRPO) eliminates the coupled value model by using the average reward of a group of rollouts as the baseline; yet, this approach necessitates extensive sampling to maintain estimation stability. In this paper, we propose $V_{0}$ , a Generalist Value Model capable of estimating the expected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Now-Join-Us/Generalist-Value-Model-V0
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.