ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation

Youhe Feng; Hansen Shi; Haoyang Li; Xinlei Guo; Yang Wang; Chengyang Zhang; Jinkai Zhang; Xiaohan Zhang; Jie Tang; Jing Zhang

arXiv:2605.08774·cs.RO·May 12, 2026

ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation

Youhe Feng, Hansen Shi, Haoyang Li, Xinlei Guo, Yang Wang, Chengyang Zhang, Jinkai Zhang, Xiaohan Zhang, Jie Tang, Jing Zhang

PDF

1 Repo 2 Models 2 Datasets

TL;DR

ProcVLM is a vision-language model that learns dense, procedure-grounded progress rewards for robotic manipulation, improving task understanding and policy optimization.

Contribution

It introduces a novel procedure-grounded progress estimation method based on intra-stage visual change and intra-stage reasoning, trained on a large-scale annotated dataset.

Findings

01

ProcVLM achieves superior procedural reasoning in experiments.

02

It provides more discriminative progress estimates than baseline models.

03

ProcVLM enhances reward-guided policy learning in robotic manipulation.

Abstract

Long-horizon robotic manipulation requires dense feedback that reflects how a task advances through its procedural stages, not merely whether the final outcome is successful. Existing reward models often rely on trajectory-level success labels or time-based interpolation, which can conflate elapsed time with true task progress and therefore fail to capture unfinished steps, stagnation, and failure states. We present ProcVLM, a progress-aware vision-language model that learns procedure-grounded progress as a dense reward signal for manipulation. Rather than deriving progress from terminal outcomes or temporal proxies, ProcVLM grounds progress estimation in procedural structure and intra-stage visual change, and further adopts a reasoning-before-estimation paradigm that infers the remaining atomic actions before estimating task progress. Specifically, we construct this supervision by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://procvlm.github.io
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.