ECHO: Continuous Hierarchical Memory for Vision-Language-Action Models

Yanbin Hu; Jin Cui; Jiayi Lu; Ruixuan Yang; Jun Ye; Boran Zhao; Xingyu Chen; Xuguang Lan; and Pengju Ren

arXiv:2605.10993·cs.RO·May 13, 2026

ECHO: Continuous Hierarchical Memory for Vision-Language-Action Models

Yanbin Hu, Jin Cui, Jiayi Lu, Ruixuan Yang, Jun Ye, Boran Zhao, Xingyu Chen, Xuguang Lan, and Pengju Ren

PDF

TL;DR

ECHO introduces a hierarchical memory framework using hyperbolic autoencoders to improve experience retrieval and generalization in vision-language-action models for manipulation tasks.

Contribution

The paper presents a novel hierarchical memory system with continuous space organization, enhancing long-horizon task performance and generalization in VLA models.

Findings

01

Achieved 12.8% improvement in success rate on LIBERO-Long tasks.

02

Enhanced compositional generalization on unseen long-horizon tasks.

03

Demonstrated effectiveness in real-world experiments.

Abstract

Memory capacity is a critical factor determining the performance of Vision-Language-Action (VLA) models in long-horizon manipulation tasks. Existing memory-augmented architectures primarily rely on linear or flat storage, lacking structural priors for manipulation categories and hierarchical organization. This deficiency hinders efficient experience retrieval and limits generalization to unseen long-horizon task compositions. Inspired by the hierarchical organization of human experience, we propose ECHO (Experience Consolidation and Hierarchical Organization), a novel memory framework operating within a Continuous Hierarchical Space. By employing a hyperbolic autoencoder, ECHO maps VLA hidden states into this space. Leveraging hyperbolic metrics and entailment constraint mechanisms, experience vectors are organized into a semantic memory tree that supports efficient top-down retrieval.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.