Structured Distillation for Personalized Agent Memory: 11x Token Reduction with Retrieval Preservation
Sydney Lewis

TL;DR
This paper introduces a structured distillation method that compresses personalized agent conversation history by 11 times while maintaining high retrieval accuracy, enabling efficient long-term memory in AI agents.
Contribution
The paper presents a novel structured distillation approach that significantly reduces conversation history size with minimal loss in retrieval quality, facilitating scalable personalized agent memory.
Findings
Achieves 11x compression of conversation history.
Maintains 96% of verbatim retrieval performance.
Enables fitting thousands of exchanges within a single prompt.
Abstract
Long conversations with an AI agent create a simple problem for one user: the history is useful, but carrying it verbatim is expensive. We study personalized agent memory: one user's conversation history with an agent, distilled into a compact retrieval layer for later search. Each exchange is compressed into a compound object with four fields (exchange_core, specific_context, thematic room_assignments, and regex-extracted files_touched). The searchable distilled text averages 38 tokens per exchange. Applied to 4,182 conversations (14,340 exchanges) from 6 software engineering projects, the method reduces average exchange length from 371 to 38 tokens, yielding 11x compression. We evaluate whether personalized recall survives that compression using 201 recall-oriented queries, 107 configurations spanning 5 pure and 5 cross-layer search modes, and 5 LLM graders (214,519 consensus-graded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Web Data Mining and Analysis
