Lightweight LLM Agent Memory with Small Language Models

Jiaquan Zhang; Chaoning Zhang; Shuxu Chen; Zhenzhen Huang; Pengcheng Zheng; Zhicheng Wang; Ping Guo; Fan Mo; Sung-Ho Bae; Jie Zou; Jiwei Wei; and Yang Yang

arXiv:2604.07798·cs.AI·April 23, 2026

Lightweight LLM Agent Memory with Small Language Models

Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Zhenzhen Huang, Pengcheng Zheng, Zhicheng Wang, Ping Guo, Fan Mo, Sung-Ho Bae, Jie Zou, Jiwei Wei, and Yang Yang

PDF

TL;DR

LightMem is a lightweight, efficient memory system for LLM agents that separates online and offline processes, improving accuracy and latency in long interactions.

Contribution

It introduces a modular memory architecture with short-term, mid-term, and long-term components, optimized for small language models and multi-user settings.

Findings

01

Achieves 2.5 F1 improvement over A-MEM on LoCoMo.

02

Maintains low median latency of 83 ms for retrieval.

03

Provides consistent gains across different model scales.

Abstract

Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construction and candidate filtering. In contrast, many systems use repeated large-model calls for online memory operations, improving accuracy but accumulating latency over long interactions. We propose LightMem, a lightweight memory system for better agent memory driven by Small Language Models (SLMs). LightMem modularizes memory retrieval, writing, and long-term consolidation, and separates online processing from offline consolidation to enable efficient memory invocation under bounded compute. We organize memory into short-term memory (STM) for immediate conversational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.