AART: AI-Assisted Red-Teaming with Diverse Data Generation for New   LLM-powered Applications

Bhaktipriya Radharapu; Kevin Robinson; Lora Aroyo; Preethi Lahoti

arXiv:2311.08592·cs.SE·December 1, 2023·2 cites

AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications

Bhaktipriya Radharapu, Kevin Robinson, Lora Aroyo, Preethi Lahoti

PDF

Open Access 3 Datasets

TL;DR

AART is an automated, AI-assisted red-teaming framework that generates diverse adversarial datasets to evaluate the safety of large language models across various applications, reducing human effort and enabling early testing.

Contribution

It introduces a novel automated pipeline for adversarial data generation with customizable recipes, improving diversity and scalability over manual methods.

Findings

01

Achieves higher concept coverage than existing tools

02

Generates diverse content tailored to cultural and application contexts

03

Reduces human effort in adversarial dataset creation

Abstract

Adversarial testing of large language models (LLMs) is crucial for their safe and responsible deployment. We introduce a novel approach for automated generation of adversarial evaluation datasets to test the safety of LLM generations on new downstream applications. We call it AI-assisted Red-Teaming (AART) - an automated alternative to current manual red-teaming efforts. AART offers a data generation and augmentation pipeline of reusable and customizable recipes that reduce human effort significantly and enable integration of adversarial testing earlier in new product development. AART generates evaluation datasets with high diversity of content characteristics critical for effective adversarial testing (e.g. sensitive and harmful concepts, specific to a wide range of cultural and geographic regions and application scenarios). The data generation is steered by AI-assisted recipes to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Adversarial Robustness in Machine Learning