Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users

Manon Kempermann; Sai Suresh Macharla Vasu; Mahalakshmi Raveenthiran; Theo Farrell; Ingmar Weber

arXiv:2512.10687·cs.AI·April 21, 2026

Safe for Whom? Rethinking How We Evaluate the Safety of LLMs for Real Users

Manon Kempermann, Sai Suresh Macharla Vasu, Mahalakshmi Raveenthiran, Theo Farrell, Ingmar Weber

PDF

1 Repo

TL;DR

This study highlights the importance of incorporating user context in evaluating LLM safety, revealing that current methods may underestimate risks for vulnerable users and proposing a new context-aware evaluation approach.

Contribution

It introduces a methodology for assessing LLM safety with user-specific context, showing that context disclosure alone does not sufficiently improve safety evaluations for vulnerable populations.

Findings

01

Context-aware evaluations significantly alter safety ratings for vulnerable users.

02

Disclosing realistic user context does not necessarily improve safety assessment accuracy.

03

Effective safety evaluation requires diverse user profile considerations beyond context disclosure.

Abstract

Safety evaluations of large language models (LLMs) typically focus on universal risks like dangerous capabilities or undesirable propensities. However, millions use LLMs for personal advice on high-stakes topics like finance and health, where harms are context-dependent rather than universal. While frameworks like the OECD's AI classification recognize the need to assess individual risks, user-welfare safety evaluations remain underdeveloped. We argue that developing such evaluations is non-trivial due to fundamental questions about accounting for user context in evaluation design. In this exploratory study, we evaluated advice on finance and health from GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro across user profiles of varying vulnerability. First, we demonstrate that evaluators must have access to rich user context: identical LLM responses were rated significantly safer by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.