ReViP: Mitigating False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance
Zhuohao Li, Yinghao Li, Jian-Jian Jiang, Lang Zhou, Tianyu Zhang, Jiadong Yin, Mu Lin, Yi-Lin Wei, Wei-Shi Zheng

TL;DR
ReViP introduces a vision-proprioception rebalancing framework and a benchmark suite to reduce false completions in vision-language-action models for robotic manipulation, significantly improving robustness and success rates.
Contribution
The paper presents the first False-Completion Benchmark Suite and a novel ReViP framework that uses progress-aware visual cues to mitigate modality imbalance in VLA models.
Findings
ReViP reduces false completion errors by 26% over baseline models.
The benchmark suite effectively evaluates robustness under controlled perturbations.
ReViP improves success rates on real-world robotic tasks.
Abstract
Vision-Language-Action (VLA) models have advanced robotic manipulation by combining vision, language, and proprioception to predict actions. However, previous methods fuse proprioceptive signals directly with vision-language features, resulting in state-dominant bias and \textbf{false completions} despite visible execution failures. We systematically analyze this failure mode, attributing it to modality imbalance, where policies overly rely on internal state progression and underuse visual evidence. To address this, we introduce the first \textbf{False-Completion Benchmark Suite}, featuring eight tasks with three controlled perturbations (\emph{Object Drop}, \emph{Distractor Swap}, \emph{Relayout}) to comprehensively evaluate false completion. Moreover, we propose \textbf{ReViP}, a novel VLA framework with \textbf{Vi}sion-\textbf{P}roprioception \textbf{Re}balance to enhance visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
