According to Me: Long-Term Personalized Referential Memory QA

Jingbiao Mei; Jinghong Chen; Guangyu Yang; Xinyu Hou; Margaret Li; Bill Byrne

arXiv:2603.01990·cs.AI·March 3, 2026

According to Me: Long-Term Personalized Referential Memory QA

Jingbiao Mei, Jinghong Chen, Guangyu Yang, Xinyu Hou, Margaret Li, Bill Byrne

PDF

Open Access 1 Datasets

TL;DR

This paper introduces ATM-Bench, a comprehensive benchmark for evaluating multimodal, multi-source personalized memory question answering, highlighting the challenges and potential improvements in long-term personalized AI assistant memory systems.

Contribution

The paper presents ATM-Bench, the first benchmark for multimodal, multi-source personalized memory QA, and proposes Schema-Guided Memory to better represent diverse memory sources.

Findings

01

Current models perform poorly on complex personalized memory tasks.

02

Schema-Guided Memory improves over traditional descriptive memory.

03

State-of-the-art systems achieve under 20% accuracy on challenging tasks.

Abstract

Personalized AI assistants must recall and reason over long-term user memory, which naturally spans multiple modalities and sources such as images, videos, and emails. However, existing Long-term Memory benchmarks focus primarily on dialogue history, failing to capture realistic personalized references grounded in lived experience. We introduce ATM-Bench, the first benchmark for multimodal, multi-source personalized referential Memory QA. ATM-Bench contains approximately four years of privacy-preserving personal memory data and human-annotated question-answer pairs with ground-truth memory evidence, including queries that require resolving personal references, multi-evidence reasoning from multi-source and handling conflicting evidence. We propose Schema-Guided Memory (SGM) to structurally represent memory items originated from different sources. In experiments, we implement 5…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Jingbiao/ATM-Bench
dataset· 4.3k dl
4.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks