KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning
Zebin Yang, Tong Xie, Baotong Lu, Shaoshan Liu, Bo Yu, Meng Li

TL;DR
KEEP introduces a KV-cache-centric memory management system that significantly improves embodied planning efficiency in large language models by reducing recomputation and balancing memory loading, achieving notable speedups and success rate improvements.
Contribution
It presents three novel algorithms for memory management that enhance efficiency and effectiveness of KV cache usage in embodied planning tasks.
Findings
2.68x speedup over text-based memory methods
4.13% success rate improvement over CacheBlend
1.90x reduction in time-to-first-token
Abstract
Memory-augmented Large Language Models (LLMs) have demonstrated remarkable capability for complex and long-horizon embodied planning. By keeping track of past experiences and environmental states, memory enables LLMs to maintain a global view, thereby avoiding repetitive exploration. However, existing approaches often store the memory as raw text, leading to excessively long prompts and high prefill latency. While it is possible to store and reuse the KV caches, the efficiency benefits are greatly undermined due to frequent KV cache updates. In this paper, we propose KEEP, a KV-cache-centric memory management system for efficient embodied planning. KEEP features 3 key innovations: (1) a Static-Dynamic Memory Construction algorithm that reduces KV cache recomputation by mixed-granularity memory group; (2) a Multi-hop Memory Re-computation algorithm that dynamically identifies important…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Multimodal Machine Learning Applications · Artificial Intelligence in Games
