PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

Shuochen Liu; Junyi Zhu; Long Shu; Junda Lin; Yuhao Chen; Haotian Zhang; Chao Zhang; Derong Xu; Jia Li; Bo Tang; Zhiyu Li; Feiyu Xiong; Enhong Chen; Tong Xu

arXiv:2603.23231·cs.AI·May 19, 2026

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

Shuochen Liu, Junyi Zhu, Long Shu, Junda Lin, Yuhao Chen, Haotian Zhang, Chao Zhang, Derong Xu, Jia Li, Bo Tang, Zhiyu Li, Feiyu Xiong, Enhong Chen, Tong Xu

PDF

1 Repo 1 Datasets

TL;DR

PERMA introduces a comprehensive benchmark for evaluating long-term personalized memory in language models, emphasizing temporal consistency and realistic user interaction variability.

Contribution

It presents a novel benchmark with real-world-like interaction scenarios, incorporating temporal, linguistic, and variability challenges for memory systems.

Findings

01

Memory systems improve preference extraction by linking related interactions.

02

Advanced systems reduce token usage compared to raw dialogue retrieval.

03

Models still struggle with maintaining persona coherence over time and across domains.

Abstract

Empowering large language models with long-term memory is crucial for building agents that adapt to users' evolving needs. Existing evaluations of this capability typically interleave preference-related dialogues with irrelevant conversations, reducing the task to needle-in-a-haystack retrieval while ignoring relationships between events driving user preference evolution. Such settings overlook a fundamental characteristic of real-world personalization: preferences emerge gradually and accumulate across interactions within noisy contexts. To bridge this gap, we introduce PERMA, a benchmark designed to evaluate persona consistency over time beyond static preference recall. Additionally, we incorporate (1) text variability and (2) linguistic alignment to simulate erratic user inputs and individual idiolects in real-world data. PERMA consists of temporally ordered interaction events…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PolarisLiu1/PERMA
github

Datasets

ustclsc/PERMA
dataset· 817 dl
817 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques · Personal Information Management and User Behavior