Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment   of LLM Guardrails

Shaona Ghosh; Prasoon Varshney; Makesh Narsimhan Sreedhar; Aishwarya; Padmakumar; Traian Rebedea; Jibin Rajan Varghese; Christopher Parisien

arXiv:2501.09004·cs.CL·January 16, 2025·2 cites

Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails

Shaona Ghosh, Prasoon Varshney, Makesh Narsimhan Sreedhar, Aishwarya, Padmakumar, Traian Rebedea, Jibin Rajan Varghese, Christopher Parisien

PDF

Open Access 1 Models 1 Datasets 1 Video

TL;DR

Aegis2.0 introduces a comprehensive, human-annotated safety dataset and a detailed risk taxonomy for LLMs, enabling more adaptable and effective safety guardrails through lightweight models and hybrid annotation methods.

Contribution

The paper presents a new safety taxonomy, a large annotated dataset, and a hybrid annotation pipeline, advancing LLM safety research and practical guardrail development.

Findings

01

Lightweight models trained on Aegis 2.0 perform competitively with larger models.

02

The hybrid annotation pipeline improves safety assessment accuracy.

03

A new training blend enhances model generalization to unseen risks.

Abstract

As Large Language Models (LLMs) and generative AI become increasingly widespread, concerns about content safety have grown in parallel. Currently, there is a clear lack of high-quality, human-annotated datasets that address the full spectrum of LLM-related safety risks and are usable for commercial applications. To bridge this gap, we propose a comprehensive and adaptable taxonomy for categorizing safety risks, structured into 12 top-level hazard categories with an extension to 9 fine-grained subcategories. This taxonomy is designed to meet the diverse requirements of downstream users, offering more granular and flexible tools for managing various risk types. Using a hybrid data generation pipeline that combines human annotations with a multi-LLM "jury" system to assess the safety of responses, we obtain Aegis 2.0, a carefully curated collection of 34,248 samples of human-LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
CTCT-CT2/changeway_guardrails
model· 10 dl· ♡ 2
10 dl♡ 2

Datasets

nvidia/Nemotron-Content-Safety-Audio-Dataset
dataset· 1.2k dl
1.2k dl

Videos

AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails· underline

Taxonomy

TopicsTransportation Safety and Impact Analysis · Vehicular Ad Hoc Networks (VANETs) · Autonomous Vehicle Technology and Safety