What to Ignore, What to React: Visually Robust RL Fine-Tuning of VLA Models
Yuanfang Peng, Jingjing Fu, Chuheng Zhang, Li Zhao, Jiang Bian, Mingyu Liu, Ling Zhang, Jun Zhang, Rui Wang

TL;DR
This paper introduces PAIR-VLA, a reinforcement learning fine-tuning framework that enhances visual robustness of VLA models in robotic manipulation by using paired visual variants to guide policy responses.
Contribution
The authors propose a novel RL fine-tuning method with auxiliary objectives that improve policy robustness to visual shifts by leveraging paired visual variants during training.
Findings
Consistently improves over standard PPO in diverse visual shift scenarios.
Achieves 16.62% and 9.10% average improvements on two VLA architectures.
Demonstrates transferability of invariance and sensitivity guidance across different visual shifts.
Abstract
Reinforcement learning (RL) fine-tuning has shown promise for Vision-Language-Action (VLA) models in robotic manipulation, but deployment-time visual shifts pose practical challenges. A key difficulty is that standard task rewards supervise task success, but offer limited guidance on whether a visual change is task-irrelevant or changes the behavior required for manipulation. We propose PAIR-VLA (Paired Action Invariance & Sensitivity for Visually Robust VLA), an RL fine-tuning framework to address this difficulty by adding two auxiliary objectives over paired visual variants during PPO optimization: an invariance term that reduces the discrepancy between action distributions for a task-preserving pair (e.g., different distractors), and a sensitivity objective that encourages separable action distributions for a task-altering pair (e.g., target object in a different pose). Together,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
