TL;DR
RoboMemArena introduces a large-scale, multimodal robotic memory benchmark with real-world evaluation, designed to assess and improve long-horizon memory-dependent robotic tasks.
Contribution
It provides a comprehensive benchmark with diverse tasks, annotations, and a novel VLA system, addressing limitations of existing benchmarks and enabling advanced memory system research.
Findings
PrediMem outperforms all baselines in experiments.
Memory-dependent subtasks constitute 68.9% of RoboMemArena.
The benchmark includes real-world tasks for physical evaluation.
Abstract
Memory is a critical component of robotic intelligence, as robots must rely on past observations and actions to accomplish long-horizon tasks in partially observable environments. However, existing robotic memory benchmarks still lack multimodal annotations for memory formation, provide limited task coverage and structural complexity, and remain restricted to simulation without real-world evaluation. We address this gap with RoboMemArena, a large-scale benchmark of 26 tasks, with average trajectory lengths exceeding 1,000 steps per task and 68.9% of subtasks being memory-dependent. The generation pipeline leverages a vision-language model (VLM) to design and compose subtasks, generates full trajectories through atomic functions, and provides memory-related annotations, including subtask instructions and native keyframe annotations, while paired real-world memory tasks support physical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
