ALPBench: A Benchmark for Attribution-level Long-term Personal Behavior Understanding

Lu Ren; Junda She; Xinchen Luo; Tao Wang; Xin Ye; Xu Zhang; Muxuan Wang; Xiao Yang; Chenguang Wang; Fei Xie; Yiwei Zhou; Danjun Wu; Guodong Zhang; Yifei Hu; Guoying Zheng; Shujie Yang; Xingmei Wang; Shiyao Wang; Yukun Zhou; Fan Yang; Size Li; Kuo Cai; Qiang Luo; Ruiming Tang; Han Li; Kun Gai

arXiv:2602.03056·cs.IR·February 4, 2026

ALPBench: A Benchmark for Attribution-level Long-term Personal Behavior Understanding

Lu Ren, Junda She, Xinchen Luo, Tao Wang, Xin Ye, Xu Zhang, Muxuan Wang, Xiao Yang, Chenguang Wang, Fei Xie, Yiwei Zhou, Danjun Wu, Guodong Zhang, Yifei Hu, Guoying Zheng, Shujie Yang, Xingmei Wang, Shiyao Wang, Yukun Zhou, Fan Yang, Size Li, Kuo Cai, Qiang Luo, Ruiming Tang

PDF

Open Access

TL;DR

ALPBench is a new benchmark designed to evaluate large language models' ability to understand and predict long-term user preferences at an attribution level, focusing on complex attribute interactions and reasoning over user history.

Contribution

It introduces a novel attribution-level benchmark that assesses LLMs' capacity for long-term personal behavior understanding through attribute combination prediction.

Findings

01

ALPBench enables detailed evaluation of personalization capabilities.

02

Current LLMs struggle with complex attribute interaction reasoning.

03

The benchmark emphasizes long-term behavior modeling over explicit user requests.

Abstract

Recent advances in large language models have highlighted their potential for personalized recommendation, where accurately capturing user preferences remains a key challenge. Leveraging their strong reasoning and generalization capabilities, LLMs offer new opportunities for modeling long-term user behavior. To systematically evaluate this, we introduce ALPBench, a Benchmark for Attribution-level Long-term Personal Behavior Understanding. Unlike item-focused benchmarks, ALPBench predicts user-interested attribute combinations, enabling ground-truth evaluation even for newly introduced items. It models preferences from long-term historical behaviors rather than users' explicitly expressed requests, better reflecting enduring interests. User histories are represented as natural language sequences, allowing interpretable, reasoning-based personalization. ALPBench enables fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning