Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach

Yuchen Wu; Edward Sun; Kaijie Zhu; Jianxun Lian; Jose Hernandez-Orallo; Aylin Caliskan; Jindong Wang

arXiv:2505.18882·cs.CY·January 14, 2026

Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach

Yuchen Wu, Edward Sun, Kaijie Zhu, Jianxun Lian, Jose Hernandez-Orallo, Aylin Caliskan, Jindong Wang

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces a personalized safety benchmark and a planning-based agent for LLMs, demonstrating that personalized user information significantly enhances safety and proposing a practical, low-cost method for safe, user-specific responses.

Contribution

It presents PENGUIN, a comprehensive benchmark for personalized safety, and RAISE, a novel, training-free agent framework that improves safety by selectively acquiring user background information.

Findings

01

Personalized safety scores improve by 43.2% with user info.

02

RAISE enhances safety scores by up to 31.6%.

03

Low interaction cost of 2.7 queries per user.

Abstract

Large language models (LLMs) typically generate identical or similar responses for all users given the same prompt, posing serious safety risks in high-stakes applications where user vulnerabilities differ widely. Existing safety evaluations primarily rely on context-independent metrics - such as factuality, bias, or toxicity - overlooking the fact that the same response may carry divergent risks depending on the user's background or condition. We introduce personalized safety to fill this gap and present PENGUIN - a benchmark comprising 14,000 scenarios across seven sensitive domains with both context-rich and context-free variants. Evaluating six leading LLMs, we demonstrate that personalized user information significantly improves safety scores by 43.2%, confirming the effectiveness of personalization in safety alignment. However, not all context attributes contribute equally to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

wick1d/Personalized_Safety_Data
dataset· 31 dl
31 dl

Videos

Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach· slideslive

Taxonomy

TopicsSafety Systems Engineering in Autonomy · Business Process Modeling and Analysis