Evaluating Long-Term Memory for Long-Context Question Answering

Alessandra Terranova; Bj\"orn Ross; Alexandra Birch

arXiv:2510.23730·cs.CL·December 9, 2025

Evaluating Long-Term Memory for Long-Context Question Answering

Alessandra Terranova, Bj\"orn Ross, Alexandra Birch

PDF

TL;DR

This paper systematically evaluates various memory-augmented methods for long-context question answering in large language models, highlighting their effectiveness and how they scale with model capability.

Contribution

It provides a comprehensive comparison of memory types for long-context QA, revealing which approaches improve efficiency and accuracy across different model sizes.

Findings

01

Memory approaches reduce token usage by over 90%.

02

RAG benefits foundation models; episodic memory aids instruction-tuned models.

03

Episodic memory helps models recognize their knowledge limits.

Abstract

In order for large language models to achieve true conversational continuity and benefit from experiential learning, they need memory. While research has focused on the development of complex memory systems, it remains unclear which types of memory are most effective for long-context conversational tasks. We present a systematic evaluation of memory-augmented methods on long-context dialogues annotated for question-answering tasks that require diverse reasoning strategies. We analyse full-context prompting, semantic memory through retrieval-augmented generation and agentic memory, episodic memory through in-context learning, and procedural memory through prompt optimization. Our findings show that memory-augmented approaches reduce token usage by over 90\% while maintaining competitive accuracy. Memory architecture complexity should scale with model capability, with foundation models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.