CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs

Zhihao Liu; Chenhui Hu

arXiv:2410.21695·cs.CL·October 30, 2024

CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs

Zhihao Liu, Chenhui Hu

PDF

Open Access

TL;DR

This paper introduces CFSafety, a comprehensive benchmark with 25,000 prompts to evaluate the safety of large language models across various scenarios and attack types, revealing that even top models like GPT-4 need further safety improvements.

Contribution

The paper presents a new safety assessment benchmark, CFSafety, combining multiple safety scenarios and attack types to systematically evaluate LLM safety performance.

Findings

01

GPT-4 shows the best safety performance among tested models.

02

All models, including GPT-4, still require safety improvements.

03

The benchmark provides a standardized way to evaluate LLM safety.

Abstract

As large language models (LLMs) rapidly evolve, they bring significant conveniences to our work and daily lives, but also introduce considerable safety risks. These models can generate texts with social biases or unethical content, and under specific adversarial instructions, may even incite illegal activities. Therefore, rigorous safety assessments of LLMs are crucial. In this work, we introduce a safety assessment benchmark, CFSafety, which integrates 5 classic safety scenarios and 5 types of instruction attacks, totaling 10 categories of safety questions, to form a test set with 25k prompts. This test set was used to evaluate the natural language generation (NLG) capabilities of LLMs, employing a combination of simple moral judgment and a 1-5 safety rating scale for scoring. Using this benchmark, we tested eight popular LLMs, including the GPT series. The results indicate that while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFusion materials and technologies · Nuclear Materials and Properties

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Label Smoothing · Absolute Position Encodings · Multi-Head Attention · Softmax · Linear Warmup With Cosine Annealing · Adam