CURATe: Benchmarking Personalised Alignment of Conversational AI   Assistants

Lize Alberts; Benjamin Ellis; Andrei Lupu; Jakob Foerster

arXiv:2410.21159·cs.HC·January 31, 2025

CURATe: Benchmarking Personalised Alignment of Conversational AI Assistants

Lize Alberts, Benjamin Ellis, Andrei Lupu, Jakob Foerster

PDF

Open Access 1 Repo

TL;DR

This paper introduces a benchmark for evaluating how well conversational AI assistants can maintain personalized, safety-critical context over multiple turns, revealing systematic biases and inconsistencies in current models.

Contribution

It presents a new multi-turn benchmark for personalized alignment, analyzes ten models across diverse scenarios, and identifies key failure modes and potential research directions.

Findings

01

Top models often recommend harmful actions given context

02

Prompting with safety-critical context improves model performance

03

Models exhibit biases like sycophancy and poor context attentiveness

Abstract

We introduce a multi-turn benchmark for evaluating personalised alignment in LLM-based AI assistants, focusing on their ability to handle user-provided safety-critical contexts. Our assessment of ten leading models across five scenarios (with 337 use cases each) reveals systematic inconsistencies in maintaining user-specific consideration, with even top-rated "harmless" models making recommendations that should be recognised as obviously harmful to the user given the context provided. Key failure modes include inappropriate weighing of conflicting preferences, sycophancy (prioritising desires above safety), a lack of attentiveness to critical user information within the context window, and inconsistent application of user-specific knowledge. The same systematic biases were observed in OpenAI's o1, suggesting that strong reasoning capacities do not necessarily transfer to this kind of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lize-alberts/llm_prag_benchmark
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Topic Modeling · Context-Aware Activity Recognition Systems