Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents

Natchanon Pollertlam; Witchayut Kornsuwannawit

arXiv:2603.04814·cs.CL·March 6, 2026

Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents

Natchanon Pollertlam, Witchayut Kornsuwannawit

PDF

Open Access

TL;DR

This paper compares fact-based memory systems and long-context LLMs for persistent AI agents, analyzing their accuracy and costs across benchmarks to guide deployment choices.

Contribution

It provides a detailed cost-performance analysis of memory versus long-context inference, introducing a cost model and empirical benchmarks for decision-making.

Findings

01

Long-context GPT-5-mini excels in factual recall on certain benchmarks.

02

Memory systems are cost-effective at very long contexts after initial setup.

03

Cost profiles differ significantly, influencing deployment strategies.

Abstract

Persistent conversational AI systems face a choice between passing full conversation histories to a long-context large language model (LLM) and maintaining a dedicated memory system that extracts and retrieves structured facts. We compare a fact-based memory system built on the Mem0 framework against long-context LLM inference on three memory-centric benchmarks - LongMemEval, LoCoMo, and PersonaMemv2 - and evaluate both architectures on accuracy and cumulative API cost. Long-context GPT-5-mini achieves higher factual recall on LongMemEval and LoCoMo, while the memory system is competitive on PersonaMemv2, where persona consistency depends on stable, factual attributes suited to flat-typed extraction. We construct a cost model that incorporates prompt caching and show that the two architectures have structurally different cost profiles: long-context inference incurs a per-turn charge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare