ReViP: Mitigating False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance

Zhuohao Li; Yinghao Li; Jian-Jian Jiang; Lang Zhou; Tianyu Zhang; Jiadong Yin; Mu Lin; Yi-Lin Wei; Wei-Shi Zheng

arXiv:2601.16667·cs.RO·March 13, 2026

ReViP: Mitigating False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance

Zhuohao Li, Yinghao Li, Jian-Jian Jiang, Lang Zhou, Tianyu Zhang, Jiadong Yin, Mu Lin, Yi-Lin Wei, Wei-Shi Zheng

PDF

Open Access

TL;DR

ReViP introduces a vision-proprioception rebalancing framework and a benchmark suite to reduce false completions in vision-language-action models for robotic manipulation, significantly improving robustness and success rates.

Contribution

The paper presents the first False-Completion Benchmark Suite and a novel ReViP framework that uses progress-aware visual cues to mitigate modality imbalance in VLA models.

Findings

01

ReViP reduces false completion errors by 26% over baseline models.

02

The benchmark suite effectively evaluates robustness under controlled perturbations.

03

ReViP improves success rates on real-world robotic tasks.

Abstract

Vision-Language-Action (VLA) models have advanced robotic manipulation by combining vision, language, and proprioception to predict actions. However, previous methods fuse proprioceptive signals directly with vision-language features, resulting in state-dominant bias and \textbf{false completions} despite visible execution failures. We systematically analyze this failure mode, attributing it to modality imbalance, where policies overly rely on internal state progression and underuse visual evidence. To address this, we introduce the first \textbf{False-Completion Benchmark Suite}, featuring eight tasks with three controlled perturbations (\emph{Object Drop}, \emph{Distractor Swap}, \emph{Relayout}) to comprehensively evaluate false completion. Moreover, we propose \textbf{ReViP}, a novel VLA framework with \textbf{Vi}sion-\textbf{P}roprioception \textbf{Re}balance to enhance visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI