Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History
Serin Kim, Sangam Lee, Dongha Lee

TL;DR
Persona2Web introduces a comprehensive benchmark for evaluating personalized web agents' ability to interpret ambiguous queries by leveraging user history, addressing a key gap in current web agent capabilities.
Contribution
It presents the first benchmark for assessing personalized web agents on real web data, focusing on inferring user preferences from history without explicit instructions.
Findings
Agents face significant challenges in personalization based on user history.
Performance varies across different architectures and models.
The benchmark enables fine-grained evaluation of personalization capabilities.
Abstract
Large language models have advanced web agents, yet current agents lack personalization capabilities. Since users rarely specify every detail of their intent, practical web agents must be able to interpret ambiguous queries by inferring user preferences and contexts. To address this challenge, we present Persona2Web, the first benchmark for evaluating personalized web agents on the real open web, built upon the clarify-to-personalize principle, which requires agents to resolve ambiguity based on user history rather than relying on explicit instructions. Persona2Web consists of: (1) user histories that reveal preferences implicitly over long time spans, (2) ambiguous queries that require agents to infer implicit user preferences, and (3) a reasoning-aware evaluation framework that enables fine-grained assessment of personalization. We conduct extensive experiments across various agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Multimodal Machine Learning Applications · Recommender Systems and Techniques
