MAP-VLA: Memory-Augmented Prompting for Vision-Language-Action Model in Robotic Manipulation

Runhao Li; Wenkai Guo; Zhenyu Wu; Changyuan Wang; Haoyuan Deng; Zhenyu Weng; Yap-Peng Tan; Ziwei Wang

arXiv:2511.09516·cs.RO·November 13, 2025

MAP-VLA: Memory-Augmented Prompting for Vision-Language-Action Model in Robotic Manipulation

Runhao Li, Wenkai Guo, Zhenyu Wu, Changyuan Wang, Haoyuan Deng, Zhenyu Weng, Yap-Peng Tan, Ziwei Wang

PDF

Open Access

TL;DR

MAP-VLA introduces a memory-augmented prompting framework that enhances pre-trained vision-language-action models with demonstration-derived memory prompts, significantly improving long-horizon robotic manipulation performance in simulation and real-world tasks.

Contribution

The paper presents a novel plug-and-play memory-augmented prompting method for VLA models, enabling better handling of long-horizon tasks without retraining the entire model.

Findings

01

Up to 7.0% performance improvement in simulation benchmarks.

02

Up to 25.0% performance improvement on real robot tasks.

03

Effective retrieval and integration of demonstration memory enhances action generation.

Abstract

Pre-trained Vision-Language-Action (VLA) models have achieved remarkable success in improving robustness and generalization for end-to-end robotic manipulation. However, these models struggle with long-horizon tasks due to their lack of memory and reliance solely on immediate sensory inputs. To address this limitation, we propose Memory-Augmented Prompting for Vision-Language-Action model (MAP-VLA), a novel framework that empowers pre-trained VLA models with demonstration-derived memory prompts to augment action generation for long-horizon robotic manipulation tasks. To achieve this, MAP-VLA first constructs a memory library from historical demonstrations, where each memory unit captures information about a specific stage of a task. These memory units are implemented as learnable soft prompts optimized through prompt tuning. Then, during real-time task execution, MAP-VLA retrieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Robotic Path Planning Algorithms