SC-Safety: A Multi-round Open-ended Question Adversarial Safety Benchmark for Large Language Models in Chinese
Liang Xu, Kangkang Zhao, Lei Zhu, Hang Xue

TL;DR
This paper introduces SC-Safety, a comprehensive multi-round adversarial benchmark for evaluating the safety of Chinese large language models, revealing safety performance differences among various models and guiding safer AI development.
Contribution
It presents SC-Safety, a novel multi-round open-ended question benchmark for Chinese LLM safety assessment, with extensive adversarial interactions and insights into model safety performance.
Findings
Closed-source models are safer than open-source ones.
Chinese models have safety levels comparable to GPT-3.5-turbo.
Smaller models (6B-13B) can perform well in safety.
Abstract
Large language models (LLMs), like ChatGPT and GPT-4, have demonstrated remarkable abilities in natural language understanding and generation. However, alongside their positive impact on our daily tasks, they can also produce harmful content that negatively affects societal perceptions. To systematically assess the safety of Chinese LLMs, we introduce SuperCLUE-Safety (SC-Safety) - a multi-round adversarial benchmark with 4912 open-ended questions covering more than 20 safety sub-dimensions. Adversarial human-model interactions and conversations significantly increase the challenges compared to existing methods. Experiments on 13 major LLMs supporting Chinese yield the following insights: 1) Closed-source models outperform open-sourced ones in terms of safety; 2) Models released from China demonstrate comparable safety levels to LLMs like GPT-3.5-turbo; 3) Some smaller models with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Linear Layer · Cosine Annealing · Weight Decay · Position-Wise Feed-Forward Layer · {Dispute@FaQ-s}How to file a dispute with Expedia? · Byte Pair Encoding · Linear Warmup With Cosine Annealing · Attention Is All You Need
