RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
Weijia Liufu, Xiaoyu Guo, Ruiyi Chen, Jingzhi Liu, Kaidong Zhang, Xiwen Liang, Jianqi Lin, Dawei Sun, Yuze Wang, Rongtao Xu, Bingqian Lin, Bowen Yang, Tongtong Cao, Bowen Peng, Dongyu Zhang, Guangrun Wang, Min Wang, Liang Lin, Xiaodan Liang

TL;DR
RePO-VLA is a novel framework that enhances vision-language-action models' robustness in complex tasks by learning recovery strategies and value functions, significantly improving success rates in simulated and real-world experiments.
Contribution
It introduces recovery-aware initialization, a semantic value function, and a recovery-focused training pipeline for VLA models, advancing robustness without online failure detection.
Findings
Success rate increased from 20% to 75% on average in adversarial tests.
RePO-VLA achieves up to 80% success in real-world trials.
The framework improves handling of long-horizon, contact-rich manipulation tasks.
Abstract
Vision-Language-Action (VLA) models remain brittle in long-horizon, contact-rich manipulation because success-only imitation provides little supervision for execution drift, while failed rollouts are often discarded. We introduce RePO-VLA, a recovery-driven policy optimization framework that assigns distinct roles to success, recovery, and failure trajectories. RePO-VLA first applies Recovery-Aware Initialization (RAI), slicing recovery segments and resetting history so corrective actions depend on the current adverse state rather than the preceding failure. It then learns a Progress-Aware Semantic Value Function (PAS-VF), aligning spatiotemporal trajectory features with instructions and successful references. The resulting labels salvage useful failure prefixes via reliability decay, while low-value labels mark drift and terminal breakdowns, teaching differences among nominal, failed,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
