ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
Sombit Dey, Jan-Nico Zaech, Nikolay Nikolov, Luc Van Gool, Danda Pani Paudel

TL;DR
This paper investigates the limitations of current robotic foundation models in visual out-of-domain generalization, identifies catastrophic forgetting as a key issue, and proposes a model merging technique to enhance visual robustness, significantly improving task performance.
Contribution
It introduces ReVLA, a novel approach that mitigates visual catastrophic forgetting in robotic models through model merging, boosting out-of-domain task performance.
Findings
ReVLA improves grasping and lifting in visual OOD tasks by 77% and 66%.
Existing models lack robustness to visual out-of-domain scenarios.
Proposed model merging technique effectively restores visual generalization.
Abstract
Recent progress in large language models and access to large-scale robotic datasets has sparked a paradigm shift in robotics models transforming them into generalists able to adapt to various tasks, scenes, and robot modalities. A large step for the community are open Vision Language Action models which showcase strong performance in a wide variety of tasks. In this work, we study the visual generalization capabilities of three existing robotic foundation models, and propose a corresponding evaluation framework. Our study shows that the existing models do not exhibit robustness to visual out-of-domain scenarios. This is potentially caused by limited variations in the training data and/or catastrophic forgetting, leading to domain limitations in the vision foundation models. We further explore OpenVLA, which uses two pre-trained vision foundation models and is, therefore, expected to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
