SC-Safety: A Multi-round Open-ended Question Adversarial Safety   Benchmark for Large Language Models in Chinese

Liang Xu; Kangkang Zhao; Lei Zhu; Hang Xue

arXiv:2310.05818·cs.CL·October 10, 2023·2 cites

SC-Safety: A Multi-round Open-ended Question Adversarial Safety Benchmark for Large Language Models in Chinese

Liang Xu, Kangkang Zhao, Lei Zhu, Hang Xue

PDF

Open Access 1 Models

TL;DR

This paper introduces SC-Safety, a comprehensive multi-round adversarial benchmark for evaluating the safety of Chinese large language models, revealing safety performance differences among various models and guiding safer AI development.

Contribution

It presents SC-Safety, a novel multi-round open-ended question benchmark for Chinese LLM safety assessment, with extensive adversarial interactions and insights into model safety performance.

Findings

01

Closed-source models are safer than open-source ones.

02

Chinese models have safety levels comparable to GPT-3.5-turbo.

03

Smaller models (6B-13B) can perform well in safety.

Abstract

Large language models (LLMs), like ChatGPT and GPT-4, have demonstrated remarkable abilities in natural language understanding and generation. However, alongside their positive impact on our daily tasks, they can also produce harmful content that negatively affects societal perceptions. To systematically assess the safety of Chinese LLMs, we introduce SuperCLUE-Safety (SC-Safety) - a multi-round adversarial benchmark with 4912 open-ended questions covering more than 20 safety sub-dimensions. Adversarial human-model interactions and conversations significantly increase the challenges compared to existing methods. Experiments on 13 major LLMs supporting Chinese yield the following insights: 1) Closed-source models outperform open-sourced ones in terms of safety; 2) Models released from China demonstrate comparable safety levels to LLMs like GPT-3.5-turbo; 3) Some smaller models with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
CTCT-CT2/changeway_guardrails
model· 10 dl· ♡ 2
10 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Linear Layer · Cosine Annealing · Weight Decay · Position-Wise Feed-Forward Layer · {Dispute@FaQ-s}How to file a dispute with Expedia? · Byte Pair Encoding · Linear Warmup With Cosine Annealing · Attention Is All You Need