S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
Xiaohan Yuan, Jinfeng Li, Dongxia Wang, Yuefeng Chen, Xiaofeng Mao,, Longtao Huang, Jialuo Chen, Hui Xue, Xiaoxia Liu, Wenhai Wang, Kui Ren,, Jingyi Wang

TL;DR
S-Eval is an automated framework that uses large language models to systematically evaluate the safety of other LLMs, addressing the lack of standardized risk assessment methods and enabling real-world safety monitoring.
Contribution
The paper introduces S-Eval, a novel LLM-based safety evaluation framework with a comprehensive risk taxonomy and dual LLM components for automated test generation and safety critique.
Findings
Effective in real-world safety assessment scenarios
Flexible and adaptable to evolving safety threats
Demonstrated efficiency and effectiveness in industrial deployment
Abstract
Generative large language models (LLMs) have revolutionized natural language processing with their transformative and emergent capabilities. However, recent evidence indicates that LLMs can produce harmful content that violates social norms, raising significant concerns regarding the safety and ethical ramifications of deploying these advanced models. Thus, it is both critical and imperative to perform a rigorous and comprehensive safety evaluation of LLMs before deployment. Despite this need, owing to the extensiveness of LLM generation space, it still lacks a unified and standardized risk taxonomy to systematically reflect the LLM content safety, as well as automated safety assessment techniques to explore the potential risk efficiently. To bridge the striking gap, we propose S-Eval, a novel LLM-based automated Safety Evaluation framework with a newly defined comprehensive risk…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Safety Systems Engineering in Autonomy
MethodsBalanced Selection
