AstraNav-Memory: Contexts Compression for Long Memory
Botao Ren, Junjun Hu, Xinda Xue, Minghua Luo, Jintao Chen, Haochen Bai, Liangliang You, Mu Xu

TL;DR
This paper introduces AstraNav-Memory, an image-centric memory compression framework for embodied navigation that enhances long-term memory capacity and improves navigation performance in diverse environments.
Contribution
It presents a novel visual context compression module integrated with a VL-based navigation policy, enabling scalable long-term memory for embodied agents.
Findings
Achieves state-of-the-art navigation performance on GOAT-Bench and HM3D-OVON.
Supports configurable compression rates, with 16× compression encoding about 30 tokens per image.
Moderate compression balances efficiency and accuracy effectively.
Abstract
Lifelong embodied navigation requires agents to accumulate, retain, and exploit spatial-semantic experience across tasks, enabling efficient exploration in novel environments and rapid goal reaching in familiar ones. While object-centric memory is interpretable, it depends on detection and reconstruction pipelines that limit robustness and scalability. We propose an image-centric memory framework that achieves long-term implicit memory via an efficient visual context compression module end-to-end coupled with a Qwen2.5-VL-based navigation policy. Built atop a ViT backbone with frozen DINOv3 features and lightweight PixelUnshuffle+Conv blocks, our visual tokenizer supports configurable compression rates; for example, under a representative 16 compression setting, each image is encoded with about 30 tokens, expanding the effective context capacity from tens to hundreds of images.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
