LUCid: Redefining Relevance For Lifelong Personalization
Chimaobi Okite, Anika Misra, Joyce Chai, Rada Mihalcea

TL;DR
LUCid introduces a benchmark to evaluate the ability of personalization models to surface situationally relevant user information from distant interactions, revealing significant performance gaps in current systems.
Contribution
The paper presents LUCid, a new benchmark for measuring situational relevance in lifelong personalization, highlighting the limitations of existing models in retrieving relevant distant context.
Findings
Retrieval recall drops to near zero on difficult instances.
State-of-the-art models achieve only about 50% response alignment.
Current relevance notions do not match the situational relevance needed for personalization.
Abstract
Current approaches to lifelong personalization operationalize relevance through semantic proximity, causing them to miss essential user information from topically unrelated interactions. To address this gap, we introduce LUCid, a benchmark designed to measure situational user-centric relevance in personalization. The benchmark consists of 1,936 realistic queries paired with interaction histories from up to 500 sessions. Across multiple architectures, our experiments show significant performance collapse when relevant context must be surfaced from semantically distant history: retrieval recall drops to near zero on the hardest instances, and response alignment remains near 50% even for state-of-the-art models such as Gemini-3-Flash, GPT-5.4, and Claude Haiku. These results expose a fundamental mismatch between the notion of relevance encoded by current systems and the situational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
