AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from   Regulations and Policies

Yi Zeng; Yu Yang; Andy Zhou; Jeffrey Ziwei Tan; Yuheng Tu; Yifan Mai,; Kevin Klyman; Minzhou Pan; Ruoxi Jia; Dawn Song; Percy Liang; Bo Li

arXiv:2407.17436·cs.CY·August 7, 2024

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai,, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

PDF

1 Datasets

TL;DR

AIR-Bench 2024 is a comprehensive safety benchmark for AI models, aligned with recent regulations and policies, featuring a detailed taxonomy and diverse prompts to evaluate model safety and compliance.

Contribution

The paper introduces AIR-Bench 2024, the first safety benchmark based on regulation-driven risk categories, bridging the gap between existing benchmarks and real-world safety concerns.

Findings

01

Leading models show varied alignment with safety categories.

02

Benchmark reveals strengths and weaknesses in current AI safety.

03

Provides a standardized evaluation framework for safety compliance.

Abstract

Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in recent regulations and policies, which makes it challenging to evaluate and compare FMs across these benchmarks. To bridge this gap, we introduce AIR-Bench 2024, the first AI safety benchmark aligned with emerging government regulations and company policies, following the regulation-based safety categories grounded in our AI risks study, AIR 2024. AIR 2024 decomposes 8 government regulations and 16 company policies into a four-tiered safety taxonomy with 314 granular risk categories in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ScaleAI/BrowserART
dataset· 119 dl
119 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.