RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

Huashuo Lei; Wenxuan Song; Huarui Zhang; Jieyuan Pei; Jiayi Chen; Haodong Yan; Han Zhao; Pengxiang Ding; Zhipeng Zhang; Lida Huang; Donglin Wang; Yan Wang; Haoang Li

arXiv:2605.10921·cs.RO·May 12, 2026

RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

Huashuo Lei, Wenxuan Song, Huarui Zhang, Jieyuan Pei, Jiayi Chen, Haodong Yan, Han Zhao, Pengxiang Ding, Zhipeng Zhang, Lida Huang, Donglin Wang, Yan Wang, Haoang Li

PDF

1 Repo

TL;DR

RoboMemArena introduces a large-scale, multimodal robotic memory benchmark with real-world evaluation, designed to assess and improve long-horizon memory-dependent robotic tasks.

Contribution

It provides a comprehensive benchmark with diverse tasks, annotations, and a novel VLA system, addressing limitations of existing benchmarks and enabling advanced memory system research.

Findings

01

PrediMem outperforms all baselines in experiments.

02

Memory-dependent subtasks constitute 68.9% of RoboMemArena.

03

The benchmark includes real-world tasks for physical evaluation.

Abstract

Memory is a critical component of robotic intelligence, as robots must rely on past observations and actions to accomplish long-horizon tasks in partially observable environments. However, existing robotic memory benchmarks still lack multimodal annotations for memory formation, provide limited task coverage and structural complexity, and remain restricted to simulation without real-world evaluation. We address this gap with RoboMemArena, a large-scale benchmark of 26 tasks, with average trajectory lengths exceeding 1,000 steps per task and 68.9% of subtasks being memory-dependent. The generation pipeline leverages a vision-language model (VLM) to design and compose subtasks, generates full trajectories through atomic functions, and provides memory-related annotations, including subtask instructions and native keyframe annotations, while paired real-world memory tasks support physical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openhelix-team/RoboMemArena
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.