KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Tingyu Wu; Zhisheng Chen; Ziyan Weng; Shuhe Wang; Chenglong Li; Shuo Zhang; Sen Hu; Silin Wu; Qizhen Lan; Huacan Wang; Ronghao Chen

arXiv:2601.04745·cs.AI·April 21, 2026

KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions

Tingyu Wu, Zhisheng Chen, Ziyan Weng, Shuhe Wang, Chenglong Li, Shuo Zhang, Sen Hu, Silin Wu, Qizhen Lan, Huacan Wang, Ronghao Chen

PDF

1 Repo 1 Datasets

TL;DR

KnowMe-Bench is a new benchmark using autobiographical narratives to evaluate person understanding in AI, emphasizing stable motivations and decision principles beyond simple retrieval accuracy.

Contribution

It introduces a novel long-form autobiographical narrative benchmark with evidence-linked questions for comprehensive person understanding evaluation.

Findings

01

Retrieval-augmented systems improve factual recall.

02

Errors remain in temporally grounded explanations.

03

Memory mechanisms beyond retrieval are needed.

Abstract

Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present \BenchName, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. \BenchName~reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval. Our data is in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

QuantaAlpha/KnowMeBench
github

Datasets

realty2333/knowMe-Bench
dataset· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.