ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay
Gengyuan Zhang, Mingcong Ding, Jingpei Wu, Ruotong Liao, Volker Tresp

TL;DR
ReEXplore is a training-free framework that enhances embodied exploration in MLLMs by using retrospective experience replay and hierarchical frontier selection, significantly improving success rates and efficiency.
Contribution
It introduces a novel training-free method combining retrospective experience replay and hierarchical frontier selection for better embodied exploration with MLLMs.
Findings
Up to 3x higher success rate compared to baselines
Significant improvements in navigation efficiency
Effective across multiple exploration benchmarks
Abstract
Embodied exploration is a target-driven process that requires embodied agents to possess fine-grained perception and knowledge-enhanced decision making. While recent attempts leverage MLLMs for exploration due to their strong perceptual and reasoning abilities, we find that MLLM-based embodied agents remain suboptimal in exploring new environments: (i) they rely on profound but stale pre-trained knowledge, (ii) training-based approaches such as imitation learning or reinforcement learning are expensive for long-horizon tasks with sparse outcome rewards, and (iii) frontier-based exploration yields a large, visually nuanced action space that is difficult for MLLMs to make reliable decisions. We address these challenges with ReEXplore, a training-free framework that performs retrospective experience replay to inject distilled, abstract experience at inference time, and hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Face Recognition and Perception · Domain Adaptation and Few-Shot Learning
