ExpReS-VLA: Specializing Vision-Language-Action Models Through Experience Replay and Retrieval
Shahram Najam Syed, Yatharth Ahuja, Arthur Jakobsson, and Jeff Ichnowski

TL;DR
ExpReS-VLA introduces a method for rapid, memory-efficient adaptation of vision-language-action models to specific robotic tasks, significantly improving performance and robustness through experience replay and retrieval techniques.
Contribution
The paper presents ExpReS-VLA, a novel approach combining compressed experience replay, retrieval augmentation, and a hybrid contrastive loss for fast on-device adaptation of VLA models in robotics.
Findings
Achieves up to 93.1% accuracy on spatial reasoning tasks.
Improves long-horizon task success from 61% to 72.3%.
Demonstrates 98% success rate on physical robots in diverse conditions.
Abstract
Vision-Language-Action (VLA) models like OpenVLA demonstrate impressive zero-shot generalization across robotic manipulation tasks but struggle to adapt to specific deployment environments where consistent high performance on a limited set of tasks is more valuable than broad generalization. We present EXPierence replayed, REtrieval augmented, Specialized VLA (ExpReS-VLA), a method that enables rapid on-device adaptation of pre-trained VLAs to target domains while preventing catastrophic forgetting through compressed experience replay and retrieval-augmented generation. Our approach maintains a memory-efficient buffer by storing extracted embeddings from OpenVLA's frozen vision backbone, reducing storage requirements by 97% compared to raw image-action pairs. During deployment, ExpReS-VLA retrieves the most similar past experiences using cosine similarity to augment training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning
