Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Gunshi Gupta; Karmesh Yadav; Zsolt Kira; Yarin Gal; Rahaf Aljundi

arXiv:2510.19732·cs.AI·December 1, 2025

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Gunshi Gupta, Karmesh Yadav, Zsolt Kira, Yarin Gal, Rahaf Aljundi

PDF

Open Access 1 Datasets

TL;DR

Memo introduces a transformer-based architecture with memory compression for reinforcement learning, enabling embodied agents to handle long-horizon tasks efficiently by summarizing and retrieving relevant information, outperforming existing methods.

Contribution

The paper presents Memo, a novel transformer architecture with integrated memory summarization for RL, improving efficiency and long-term context handling in embodied agents.

Findings

01

Memo outperforms naive long-context transformers.

02

Memo is more compute and storage efficient.

03

Memo generalizes better to longer contexts.

Abstract

To enable embodied agents to operate effectively over extended timeframes, it is crucial to develop models that form and access memories to stay contextualized in their environment. In the current paradigm of training transformer-based policies for embodied sequential decision-making tasks, visual inputs often overwhelm the context limits of transformers, while humans can maintain and utilize a lifetime of experience compressed as memories. Significant compression is possible in principle, as much of the input is irrelevant and can be abstracted. However, existing approaches predominantly focus on either recurrent models with fixed-size memory or transformers with full-context reliance. In this work, we propose Memo, a transformer-based architecture and training recipe for reinforcement learning (RL) on memory-intensive, long-horizon tasks. Memo incorporates the creation and retrieval…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ykarmesh/ExtObjNav_HSSD_Diverse
dataset· 4 dl
4 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Human Pose and Action Recognition