Chameleon: Episodic Memory for Long-Horizon Robotic Manipulation
Xinying Guo, Chenxi Jiang, Hyun Bin Kim, Ying Sun, Yang Xiao, Yuhang Han, Jianfei Yang

TL;DR
Chameleon introduces a geometry-grounded episodic memory system for robotic manipulation, enhancing decision reliability and long-horizon control in environments with perceptual aliasing by leveraging multimodal tokens and a differentiable memory stack.
Contribution
It proposes a novel episodic memory architecture inspired by human memory, using geometry-grounded tokens and a differentiable stack for improved robotic manipulation.
Findings
Improves decision reliability in perceptually aliased environments.
Enhances long-horizon control over baseline methods.
Demonstrates effectiveness on a new real-robot dataset.
Abstract
Robotic manipulation often requires memory: occlusion and state changes can make decision-time observations perceptually aliased, making action selection non-Markovian at the observation level because the same observation may arise from different interaction histories. Most embodied agents implement memory via semantically compressed traces and similarity-based retrieval, which discards disambiguating fine-grained perceptual cues and can return perceptually similar but decision-irrelevant episodes. Inspired by human episodic memory, we propose Chameleon, which writes geometry-grounded multimodal tokens to preserve disambiguating context and produces goal-directed recall through a differentiable memory stack. We also introduce Camo-Dataset, a real-robot UR5e dataset spanning episodic recall, spatial tracking, and sequential manipulation under perceptual aliasing. Across tasks, Chameleon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Robot Manipulation and Learning · Multimodal Machine Learning Applications
