Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models
Yeonjun In, Wonjoong Kim, Kanghoon Yoon, Sungchul Kim, Mehrab Tanjim, Sangwu Park, Kibum Kim, Chanyoung Park

TL;DR
This paper introduces U-SafeBench, a benchmark for evaluating large language models' safety tailored to individual user standards, revealing current models' shortcomings and proposing a chain-of-thought remedy to enhance safety.
Contribution
The paper presents the first benchmark for user-specific safety evaluation of LLMs and demonstrates the models' failure to meet personalized safety standards, along with a simple improvement method.
Findings
Current LLMs fail to meet user-specific safety standards.
U-SafeBench effectively evaluates user-specific safety.
Chain-of-thought improves safety performance.
Abstract
As the use of large language model (LLM) agents continues to grow, their safety vulnerabilities have become increasingly evident. Extensive benchmarks evaluate various aspects of LLM safety by defining the safety relying heavily on general standards, overlooking user-specific standards. However, safety standards for LLM may vary based on a user-specific profiles rather than being universally consistent across all users. This raises a critical research question: Do LLM agents act safely when considering user-specific safety standards? Despite its importance for safe LLM use, no benchmark datasets currently exist to evaluate the user-specific safety of LLMs. To address this gap, we introduce U-SafeBench, a benchmark designed to assess user-specific aspect of LLM safety. Our evaluation of 20 widely used LLMs reveals current LLMs fail to act safely when considering user-specific safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
