RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Quy-Anh Dang; Chris Ngo; Truong-Son Hy

arXiv:2601.03699·cs.CL·April 20, 2026

RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

Quy-Anh Dang, Chris Ngo, Truong-Son Hy

PDF

1 Repo

TL;DR

RedBench is a comprehensive, standardized dataset designed to evaluate large language models' robustness against adversarial prompts across multiple domains and risk categories.

Contribution

It introduces a unified dataset with a standardized taxonomy, addressing limitations of prior datasets and enabling systematic vulnerability assessment of LLMs.

Findings

01

RedBench aggregates 37 datasets with 29,362 samples.

02

It establishes baselines for modern LLMs.

03

The dataset and evaluation code are open-sourced.

Abstract

As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount. However, existing red teaming datasets suffer from inconsistent risk categorizations, limited domain coverage, and outdated evaluations, hindering systematic vulnerability assessments. To address these challenges, we introduce RedBench, a universal dataset aggregating 37 benchmark datasets from leading conferences and repositories, comprising 29,362 samples across attack and refusal prompts. RedBench employs a standardized taxonomy with 22 risk categories and 19 domains, enabling consistent and comprehensive evaluations of LLM vulnerabilities. We provide a detailed analysis of existing datasets, establish baselines for modern LLMs, and open-source the dataset and evaluation code. Our contributions facilitate robust comparisons, foster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

knoveleng/redeval
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.