RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies
Yinpei Dai, Hongze Fu, Jayjun Lee, Yuejiang Liu, Haoran Zhang, Jianing Yang, Chelsea Finn, Nima Fazeli, Joyce Chai

TL;DR
RoboMME introduces a comprehensive benchmark and analysis of memory mechanisms in vision-language-action models for complex robotic manipulation tasks, highlighting task-dependent effectiveness of different memory strategies.
Contribution
It provides the first large-scale standardized benchmark for evaluating memory in VLA models across diverse manipulation tasks and systematically compares various memory architectures.
Findings
Memory effectiveness varies significantly across tasks.
Different memory architectures have unique strengths and weaknesses.
Benchmark enables systematic comparison and progress measurement.
Abstract
Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluations remain confined to narrow, non-standardized settings. This limits their systematic understanding, comparison, and progress measurement. To address these challenges, we introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models in long-horizon, history-dependent scenarios. Our benchmark comprises 16 manipulation tasks constructed under a carefully designed taxonomy that evaluates temporal, spatial, object, and procedural memory. We further develop a suite of 14 memory-augmented VLA variants built on the {\pi}0.5 backbone to systematically explore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI
