MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents
Vishnu Sashank Dorbala, Dinesh Manocha

TL;DR
MemCtrl introduces a novel framework that employs trainable memory gating in Multimodal Large Language Models to enhance online memory management for embodied agents, significantly improving task performance.
Contribution
This work presents MemCtrl, a new approach using trainable memory gates in MLLMs for online memory pruning, tailored for embodied agents with strict memory and compute constraints.
Findings
16% average improvement on EmbodiedBench tasks
Over 20% improvement on specific instruction subsets
Qualitative analysis shows better handling of complex instructions
Abstract
Foundation models rely on in-context learning for personalized decision making. The limited size of this context window necessitates memory compression and retrieval systems like RAG. These systems however often treat memory as large offline storage spaces, which is unfavorable for embodied agents that are expected to operate under strict memory and compute constraints, online. In this work, we propose MemCtrl, a novel framework that uses Multimodal Large Language Models (MLLMs) for pruning memory online. MemCtrl augments MLLMs with a trainable memory head \mu that acts as a gate to determine which observations or reflections to retain, update, or discard during exploration. We evaluate with training two types of \mu, 1) via an offline expert, and 2) via online RL, and observe significant improvement in overall embodied task completion ability on \mu-augmented MLLMs. In particular, on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Machine Learning in Healthcare
