TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics

Shirui Chen; Cole Harrison; Ying-Chun Lee; Angela Jin Yang; Zhongzheng Ren; Lillian J. Ratliff; Jiafei Duan; Dieter Fox; Ranjay Krishna

arXiv:2602.19313·cs.RO·February 24, 2026

TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics

Shirui Chen, Cole Harrison, Ying-Chun Lee, Angela Jin Yang, Zhongzheng Ren, Lillian J. Ratliff, Jiafei Duan, Dieter Fox, Ranjay Krishna

PDF

Open Access

TL;DR

TOPReward introduces a novel probabilistic temporal value function that leverages pretrained vision-language models to estimate robotic task progress, significantly improving zero-shot reward estimation across diverse real-world tasks.

Contribution

It presents TOPReward, a new method that extracts task progress directly from VLM token logits, enhancing generalization and performance in robotic reinforcement learning.

Findings

01

Achieves 0.947 mean VOC on 130+ real-world tasks

02

Outperforms state-of-the-art GVL baseline

03

Enables downstream applications like success detection

Abstract

While Vision-Language-Action (VLA) models have seen rapid progress in pretraining, their advancement in Reinforcement Learning (RL) remains hampered by low sample efficiency and sparse rewards in real-world settings. Developing generalizable process reward models is essential for providing the fine-grained feedback necessary to bridge this gap, yet existing temporal value functions often fail to generalize beyond their training domains. We introduce TOPReward, a novel, probabilistically grounded temporal value function that leverages the latent world knowledge of pretrained video Vision-Language Models (VLMs) to estimate robotic task progress. Unlike prior methods that prompt VLMs to directly output progress values, which are prone to numerical misrepresentation, TOPReward extracts task progress directly from the VLM's internal token logits. In zero-shot evaluations across 130+ distinct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Social Robot Interaction and HRI