S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large   Language Models

Xiaohan Yuan; Jinfeng Li; Dongxia Wang; Yuefeng Chen; Xiaofeng Mao,; Longtao Huang; Jialuo Chen; Hui Xue; Xiaoxia Liu; Wenhai Wang; Kui Ren,; Jingyi Wang

arXiv:2405.14191·cs.CR·April 8, 2025·1 cites

S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models

Xiaohan Yuan, Jinfeng Li, Dongxia Wang, Yuefeng Chen, Xiaofeng Mao,, Longtao Huang, Jialuo Chen, Hui Xue, Xiaoxia Liu, Wenhai Wang, Kui Ren,, Jingyi Wang

PDF

Open Access 1 Repo 1 Datasets

TL;DR

S-Eval is an automated framework that uses large language models to systematically evaluate the safety of other LLMs, addressing the lack of standardized risk assessment methods and enabling real-world safety monitoring.

Contribution

The paper introduces S-Eval, a novel LLM-based safety evaluation framework with a comprehensive risk taxonomy and dual LLM components for automated test generation and safety critique.

Findings

01

Effective in real-world safety assessment scenarios

02

Flexible and adaptable to evolving safety threats

03

Demonstrated efficiency and effectiveness in industrial deployment

Abstract

Generative large language models (LLMs) have revolutionized natural language processing with their transformative and emergent capabilities. However, recent evidence indicates that LLMs can produce harmful content that violates social norms, raising significant concerns regarding the safety and ethical ramifications of deploying these advanced models. Thus, it is both critical and imperative to perform a rigorous and comprehensive safety evaluation of LLMs before deployment. Despite this need, owing to the extensiveness of LLM generation space, it still lacks a unified and standardized risk taxonomy to systematically reflect the LLM content safety, as well as automated safety assessment techniques to explore the potential risk efficiently. To bridge the striking gap, we propose S-Eval, a novel LLM-based automated Safety Evaluation framework with a newly defined comprehensive risk…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

is2lab/s-eval
noneOfficial

Datasets

IS2Lab/S-Eval
dataset· 1.1k dl
1.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Safety Systems Engineering in Autonomy

MethodsBalanced Selection