Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking

Vaidehi Bagaria; Nikshep Grampurohit; Pulkit Verma

arXiv:2605.16154·cs.LG·May 18, 2026

Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking

Vaidehi Bagaria, Nikshep Grampurohit, Pulkit Verma

PDF

TL;DR

This paper introduces Probabilistic Chunk Masking (PCM), a method that reduces gradient computation in vision-language-action reinforcement learning by selectively focusing on informative trajectory segments, leading to significant speedups.

Contribution

PCM is a novel modification to GRPO that allocates gradient computation to a subset of trajectory chunks based on success-failure variance, improving efficiency without sacrificing success rates.

Findings

01

PCM achieves 2.38x wall-clock speedup over standard GRPO.

02

Fewer than 20% of trajectory chunks are backpropagated through with PCM.

03

PCM reduces peak activation memory by 60% while maintaining success rates.

Abstract

Reinforcement learning (RL) allows vision-language-action (VLA) policies to generalize beyond their training distribution by optimizing directly for task success, but post-training is computationally expensive. A natural response has been to speed rollout collection through faster simulators and world models. In GRPO-based VLA RL, we find that the dominant cost lies elsewhere: gradient computation accounts for approximately 78% of wall-clock time per step in our runs, while rollout collection accounts for only 21%. Gradient cost dominates because much of this computation is spent on phases that contribute little to learning. GRPO's learning signal is driven by advantage variance: only phases where successful and failed rollouts diverge produce learning signal. However, GRPO assigns the same advantage to every chunk in a rollout. As a result, actor-update compute is spent uniformly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.