Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration
Sen Wang, Bangwei Liu, Zhenkun Gao, Lizhuang Ma, Xuhong Wang, Yuan Xie, Xin Tan

TL;DR
This paper introduces LMEE, a benchmark and framework for embodied exploration that leverages long-term memory and multimodal LLMs to improve lifelong learning and proactive exploration in complex environments.
Contribution
It proposes a new benchmark, LMEE-Bench, and a novel method, MemoryExplorer, for enhancing long-term memory utilization and exploration in embodied agents using reinforcement learning.
Findings
MemoryExplorer improves proactive exploration in long-horizon tasks.
The LMEE-Bench dataset enables comprehensive evaluation of exploration processes.
The approach outperforms existing models in embodied exploration tasks.
Abstract
An ideal embodied agent should possess lifelong learning capabilities to handle long-horizon and complex tasks, enabling continuous operation in general environments. This not only requires the agent to accurately accomplish given tasks but also to leverage long-term episodic memory to optimize decision-making. However, existing mainstream one-shot embodied tasks primarily focus on task completion results, neglecting the crucial process of exploration and memory utilization. To address this, we propose Long-term Memory Embodied Exploration (LMEE), which aims to unify the agent's exploratory cognition and decision-making behaviors to promote lifelong learning. We further construct a corresponding dataset and benchmark, LMEE-Bench, incorporating multi-goal navigation and memory-based question answering to comprehensively evaluate both the process and outcome of embodied exploration. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Action Observation and Synchronization · Social Robot Interaction and HRI
