TL;DR
This study highlights the importance of incorporating user context in evaluating LLM safety, revealing that current methods may underestimate risks for vulnerable users and proposing a new context-aware evaluation approach.
Contribution
It introduces a methodology for assessing LLM safety with user-specific context, showing that context disclosure alone does not sufficiently improve safety evaluations for vulnerable populations.
Findings
Context-aware evaluations significantly alter safety ratings for vulnerable users.
Disclosing realistic user context does not necessarily improve safety assessment accuracy.
Effective safety evaluation requires diverse user profile considerations beyond context disclosure.
Abstract
Safety evaluations of large language models (LLMs) typically focus on universal risks like dangerous capabilities or undesirable propensities. However, millions use LLMs for personal advice on high-stakes topics like finance and health, where harms are context-dependent rather than universal. While frameworks like the OECD's AI classification recognize the need to assess individual risks, user-welfare safety evaluations remain underdeveloped. We argue that developing such evaluations is non-trivial due to fundamental questions about accounting for user context in evaluation design. In this exploratory study, we evaluated advice on finance and health from GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro across user profiles of varying vulnerability. First, we demonstrate that evaluators must have access to rich user context: identical LLM responses were rated significantly safer by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
