TL;DR
EviMem introduces an iterative retrieval framework that explicitly identifies evidence gaps to improve long-term conversational memory across multiple sessions.
Contribution
It combines sufficiency-based gap diagnosis with a layered memory architecture for targeted query refinement in retrieval tasks.
Findings
EviMem improves Judge Accuracy on temporal questions from 73.3% to 81.6%.
EviMem enhances multi-hop question accuracy from 65.9% to 85.2%.
Achieves these improvements at 4.5x lower latency.
Abstract
Long-term conversational memory requires retrieving evidence scattered across multiple sessions, yet single-pass retrieval fails on temporal and multi-hop questions. Existing iterative methods refine queries via generated content or document-level signals, but none explicitly diagnoses the evidence gap, namely what is missing from the accumulated retrieval set, leaving query refinement untargeted. We present EviMem, combining IRIS (Iterative Retrieval via Insufficiency Signals), a closed-loop framework that detects evidence gaps through sufficiency evaluation, diagnoses what is missing, and drives targeted query refinement, with LaceMem (Layered Architecture for Conversational Evidence Memory), a coarse-to-fine memory hierarchy supporting fine-grained gap diagnosis. On LoCoMo, EviMem improves Judge Accuracy over MIRIX on temporal (73.3% to 81.6%) and multi-hop (65.9% to 85.2%) questions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
