AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

Jianfei Xiao; Xiang Yu; Chengbing Wang; Wuqiang Zheng; Xinyu Lin; Kaining Liu; Hongxun Ding; Yang Zhang; Wenjie Wang; Fuli Feng; Xiangnan He

arXiv:2603.26680·cs.CL·May 12, 2026

AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

Jianfei Xiao, Xiang Yu, Chengbing Wang, Wuqiang Zheng, Xinyu Lin, Kaining Liu, Hongxun Ding, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He

PDF

TL;DR

AlpsBench is a new benchmark derived from real-world dialogues to evaluate LLM personalization, focusing on memory management tasks like extraction, updating, retrieval, and utilization.

Contribution

It introduces a realistic, structured benchmark for LLM personalization based on real dialogues, addressing limitations of synthetic data and evaluating the entire memory lifecycle.

Findings

01

Models struggle to extract latent user traits reliably.

02

Memory updating performance plateaus even in strong models.

03

Retrieval accuracy drops with large distractor pools.

Abstract

As Large Language Models (LLMs) evolve into lifelong AI assistants, LLM personalization has become a critical frontier. However, progress is currently bottlenecked by the absence of a gold-standard evaluation benchmark. Existing benchmarks either overlook personalized information management that is critical for personalization or rely heavily on synthetic dialogues, which exhibit an inherent distribution gap from real-world dialogue. To bridge this gap, we introduce AlpsBench, An LLM PerSonalization benchmark derived from real-world human-LLM dialogues. AlpsBench comprises 2,500 long-term interaction sequences curated from WildChat, paired with human-verified structured memories that encapsulate both explicit and implicit personalization signals. We define four pivotal tasks - personalized information extraction, updating, retrieval, and utilization - and establish protocols to evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.