ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks
Samuel Sameer Tanguturi

TL;DR
This paper evaluates existing memory benchmarks against the continuity properties defined in ATANT v1.0, revealing they measure different aspects and highlighting the need for dedicated continuity evaluation.
Contribution
It provides a structural analysis of existing benchmarks, identifies methodological flaws, and clarifies that current evaluations do not measure continuity as defined in ATANT v1.0.
Findings
Existing benchmarks cover on average less than half of the continuity properties.
Methodological defects, including a bug in LOCOMO, compromise their validity.
ATANT's evaluation shows a high score (96%) indicating its focus on continuity.
Abstract
ATANT v1.0 (arXiv:2604.06710) defined continuity as a system property with 7 required properties and introduced a 10-checkpoint, LLM-free evaluation methodology validated on a 250-story corpus. Since publication, a recurring reviewer and practitioner question has concerned not the framework itself but its relationship to a wider set of memory evaluations: LOCOMO, LongMemEval, BEAM, MemoryBench, Zep's evaluation suite, Letta/MemGPT's evaluations, and RULER. This companion paper, v1.1, does not modify the v1.0 standard. It closes a related-work gap that v1.0 left brief under page limits. We show by structural analysis that none of these benchmarks measures continuity as defined in v1.0: of the 7 required properties, the median existing eval covers 1 property, the mean covers 0.43 when partial credit is scored at 0.5, and no eval covers more than 2. We provide a cell-by-cell…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
