Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Shaona Ghosh, Prasoon Varshney, Makesh Narsimhan Sreedhar, Aishwarya, Padmakumar, Traian Rebedea, Jibin Rajan Varghese, Christopher Parisien

TL;DR
Aegis2.0 introduces a comprehensive, human-annotated safety dataset and a detailed risk taxonomy for LLMs, enabling more adaptable and effective safety guardrails through lightweight models and hybrid annotation methods.
Contribution
The paper presents a new safety taxonomy, a large annotated dataset, and a hybrid annotation pipeline, advancing LLM safety research and practical guardrail development.
Findings
Lightweight models trained on Aegis 2.0 perform competitively with larger models.
The hybrid annotation pipeline improves safety assessment accuracy.
A new training blend enhances model generalization to unseen risks.
Abstract
As Large Language Models (LLMs) and generative AI become increasingly widespread, concerns about content safety have grown in parallel. Currently, there is a clear lack of high-quality, human-annotated datasets that address the full spectrum of LLM-related safety risks and are usable for commercial applications. To bridge this gap, we propose a comprehensive and adaptable taxonomy for categorizing safety risks, structured into 12 top-level hazard categories with an extension to 9 fine-grained subcategories. This taxonomy is designed to meet the diverse requirements of downstream users, offering more granular and flexible tools for managing various risk types. Using a hybrid data generation pipeline that combines human annotations with a multi-LLM "jury" system to assess the safety of responses, we obtain Aegis 2.0, a carefully curated collection of 34,248 samples of human-LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTransportation Safety and Impact Analysis · Vehicular Ad Hoc Networks (VANETs) · Autonomous Vehicle Technology and Safety
