DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA
Jianing Yin, Tan Tang

TL;DR
DeferMem introduces a reinforcement learning-based framework that improves long-term memory question answering by effectively distilling relevant evidence from scattered conversational histories.
Contribution
It decouples candidate retrieval from evidence distillation, using reinforcement learning to optimize query-specific evidence extraction for better accuracy and efficiency.
Findings
DeferMem outperforms strong baselines in QA accuracy.
It achieves faster runtime and lower memory costs.
It attains the highest QA accuracy with zero API token cost.
Abstract
Large language model (LLM) agents still struggle with long-term memory question answering, where answer-supporting evidence is often scattered across long conversational histories and buried in substantial irrelevant content. Existing memory systems typically process memory before future queries are known, then retrieve the resulting units based on similarity rather than their utility for answering the query. This workflow leaves downstream answerers to denoise retrieved candidates and reconstruct query-specific evidence. We present DeferMem, a long-term memory framework that decouples this problem into high-recall candidate retrieval and query-conditioned evidence distillation. DeferMem uses a lightweight segment-link structure to organize raw history and retrieve broad candidates at query time. It then applies a memory distiller trained with DistillPO, our reinforcement learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
