SafeLawBench: Towards Safe Alignment of Large Language Models

Chuxue Cao; Han Zhu; Jiaming Ji; Qichao Sun; Zhenghao Zhu; Yinyu Wu; Juntao Dai; Yaodong Yang; Sirui Han; Yike Guo

arXiv:2506.06636·cs.CL·June 10, 2025

SafeLawBench: Towards Safe Alignment of Large Language Models

Chuxue Cao, Han Zhu, Jiaming Ji, Qichao Sun, Zhenghao Zhu, Yinyu Wu, Juntao Dai, Yaodong Yang, Sirui Han, Yike Guo

PDF

Open Access 1 Video

TL;DR

This paper introduces SafeLawBench, a comprehensive legal-based safety evaluation benchmark for large language models, revealing current models' safety limitations and proposing a majority voting enhancement.

Contribution

It presents the first legal perspective-based safety benchmark for LLMs, including extensive evaluation and insights into safety performance and reasoning stability.

Findings

01

Leading models like GPT-4o score below 80.5% accuracy.

02

Average LLM safety accuracy is 68.8%.

03

Majority voting improves safety evaluation performance.

Abstract

With the growing prevalence of large language models (LLMs), the safety of LLMs has raised significant concerns. However, there is still a lack of definitive standards for evaluating their safety due to the subjective nature of current safety benchmarks. To address this gap, we conducted the first exploration of LLMs' safety evaluation from a legal perspective by proposing the SafeLawBench benchmark. SafeLawBench categorizes safety risks into three levels based on legal standards, providing a systematic and comprehensive framework for evaluation. It comprises 24,860 multi-choice questions and 1,106 open-domain question-answering (QA) tasks. Our evaluation included 2 closed-source LLMs and 18 open-source LLMs using zero-shot and few-shot prompting, highlighting the safety features of each model. We also evaluated the LLMs' safety-related reasoning stability and refusal behavior.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SafeLawBench: Towards Safe Alignment of Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification