ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval

Shahram Najam Syed; Yatharth Ahuja; Arthur Jakobsson; and Jeff Ichnowski

arXiv:2511.06202·cs.RO·March 9, 2026

ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval

Shahram Najam Syed, Yatharth Ahuja, Arthur Jakobsson, and Jeff Ichnowski

PDF

Open Access

TL;DR

ExpReS-VLA introduces a method for rapid, memory-efficient adaptation of vision-language-action models to specific robotic tasks, significantly improving performance and robustness through experience replay and retrieval techniques.

Contribution

The paper presents ExpReS-VLA, a novel approach combining compressed experience replay, retrieval augmentation, and a hybrid contrastive loss for fast on-device adaptation of VLA models in robotics.

Findings

01

Achieves up to 93.1% accuracy on spatial reasoning tasks.

02

Improves long-horizon task success from 61% to 72.3%.

03

Demonstrates 98% success rate on physical robots in diverse conditions.

Abstract

Vision-Language-Action (VLA) models like OpenVLA demonstrate impressive zero-shot generalization across robotic manipulation tasks but struggle to adapt to specific deployment environments where consistent high performance on a limited set of tasks is more valuable than broad generalization. We present EXPierence replayed, REtrieval augmented, Specialized VLA (ExpReS-VLA), a method that enables rapid on-device adaptation of pre-trained VLAs to target domains while preventing catastrophic forgetting through compressed experience replay and retrieval-augmented generation. Our approach maintains a memory-efficient buffer by storing extracted embeddings from OpenVLA's frozen vision backbone, reducing storage requirements by 97% compared to raw image-action pairs. During deployment, ExpReS-VLA retrieves the $k$ most similar past experiences using cosine similarity to augment training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning