ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models

Sombit Dey; Jan-Nico Zaech; Nikolay Nikolov; Luc Van Gool; Danda Pani Paudel

arXiv:2409.15250·cs.CV·May 21, 2025·2 cites

ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models

Sombit Dey, Jan-Nico Zaech, Nikolay Nikolov, Luc Van Gool, Danda Pani Paudel

PDF

Open Access 1 Models

TL;DR

This paper investigates the limitations of current robotic foundation models in visual out-of-domain generalization, identifies catastrophic forgetting as a key issue, and proposes a model merging technique to enhance visual robustness, significantly improving task performance.

Contribution

It introduces ReVLA, a novel approach that mitigates visual catastrophic forgetting in robotic models through model merging, boosting out-of-domain task performance.

Findings

01

ReVLA improves grasping and lifting in visual OOD tasks by 77% and 66%.

02

Existing models lack robustness to visual out-of-domain scenarios.

03

Proposed model merging technique effectively restores visual generalization.

Abstract

Recent progress in large language models and access to large-scale robotic datasets has sparked a paradigm shift in robotics models transforming them into generalists able to adapt to various tasks, scenes, and robot modalities. A large step for the community are open Vision Language Action models which showcase strong performance in a wide variety of tasks. In this work, we study the visual generalization capabilities of three existing robotic foundation models, and propose a corresponding evaluation framework. Our study shows that the existing models do not exhibit robustness to visual out-of-domain scenarios. This is potentially caused by limited variations in the training data and/or catastrophic forgetting, leading to domain limitations in the vision foundation models. We further explore OpenVLA, which uses two pre-trained vision foundation models and is, therefore, expected to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
INSAIT-Institute/ReVLA-Bridge
model· 2 dl· ♡ 1
2 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Neural Network Applications