CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants
Lize Alberts, Benjamin Ellis, Andrei Lupu, Jakob Foerster

TL;DR
This paper introduces a benchmark for evaluating how well conversational AI assistants can maintain personalized, safety-critical context over multiple turns, revealing systematic biases and inconsistencies in current models.
Contribution
It presents a new multi-turn benchmark for personalized alignment, analyzes ten models across diverse scenarios, and identifies key failure modes and potential research directions.
Findings
Top models often recommend harmful actions given context
Prompting with safety-critical context improves model performance
Models exhibit biases like sycophancy and poor context attentiveness
Abstract
We introduce a multi-turn benchmark for evaluating personalised alignment in LLM-based AI assistants, focusing on their ability to handle user-provided safety-critical contexts. Our assessment of ten leading models across five scenarios (with 337 use cases each) reveals systematic inconsistencies in maintaining user-specific consideration, with even top-rated "harmless" models making recommendations that should be recognised as obviously harmful to the user given the context provided. Key failure modes include inappropriate weighing of conflicting preferences, sycophancy (prioritising desires above safety), a lack of attentiveness to critical user information within the context window, and inconsistent application of user-specific knowledge. The same systematic biases were observed in OpenAI's o1, suggesting that strong reasoning capacities do not necessarily transfer to this kind of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Topic Modeling · Context-Aware Activity Recognition Systems
