Knowledge Access Beats Model Size: Memory Augmented Routing for Persistent AI Agents
Xunzhuo Liu, Bowei He, Xue Liu, Andy Luo, Haichen Zhang, Huamin Chen

TL;DR
This paper demonstrates that memory-augmented retrieval significantly improves AI agent efficiency and accuracy for user-specific queries, surpassing larger models without additional training.
Contribution
Introducing a memory-augmented inference framework that leverages conversational context to enhance small models, reducing costs while maintaining high performance without extra training.
Findings
Memory-augmented models recover 69% of large model performance with 96% cost reduction.
Memory improves correctness by grounding responses in relevant user-specific information.
Retrieval quality directly impacts system performance, with hybrid retrieval methods further boosting accuracy.
Abstract
Production AI agents frequently receive user-specific queries that are highly repetitive, with up to 47\% being semantically similar to prior interactions, yet each query is typically processed with the same computational cost. We argue that this redundancy can be exploited through conversational memory, transforming repetition from a cost burden into an efficiency advantage. We propose a memory-augmented inference framework in which a lightweight 8B-parameter model leverages retrieved conversational context to answer all queries via a low-cost inference path. Without any additional training or labeled data, this approach achieves 30.5\% F1, recovering 69\% of the performance of a full-context 235B model while reducing effective cost by 96\%. Notably, a 235B model without memory (13.7\% F1) underperforms even the standalone 8B model (15.4\% F1), indicating that for user-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Information Retrieval and Search Behavior
