Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach
Yuchen Wu, Edward Sun, Kaijie Zhu, Jianxun Lian, Jose Hernandez-Orallo, Aylin Caliskan, Jindong Wang

TL;DR
This paper introduces a personalized safety benchmark and a planning-based agent for LLMs, demonstrating that personalized user information significantly enhances safety and proposing a practical, low-cost method for safe, user-specific responses.
Contribution
It presents PENGUIN, a comprehensive benchmark for personalized safety, and RAISE, a novel, training-free agent framework that improves safety by selectively acquiring user background information.
Findings
Personalized safety scores improve by 43.2% with user info.
RAISE enhances safety scores by up to 31.6%.
Low interaction cost of 2.7 queries per user.
Abstract
Large language models (LLMs) typically generate identical or similar responses for all users given the same prompt, posing serious safety risks in high-stakes applications where user vulnerabilities differ widely. Existing safety evaluations primarily rely on context-independent metrics - such as factuality, bias, or toxicity - overlooking the fact that the same response may carry divergent risks depending on the user's background or condition. We introduce personalized safety to fill this gap and present PENGUIN - a benchmark comprising 14,000 scenarios across seven sensitive domains with both context-rich and context-free variants. Evaluating six leading LLMs, we demonstrate that personalized user information significantly improves safety scores by 43.2%, confirming the effectiveness of personalization in safety alignment. However, not all context attributes contribute equally to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSafety Systems Engineering in Autonomy · Business Process Modeling and Analysis
