GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection

Melissa Kazemi Rad; Alberto Purpura; Himanshu Kumar; Emily Chen; Mohammad Shahed Sorower

arXiv:2508.17057·cs.CL·August 26, 2025

GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection

Melissa Kazemi Rad, Alberto Purpura, Himanshu Kumar, Emily Chen, Mohammad Shahed Sorower

PDF

1 Video

TL;DR

GRAID is a novel data augmentation pipeline that uses geometric constraints and multi-agent reflection with LLMs to improve harmful content detection by addressing data scarcity and exploring edge cases.

Contribution

GRAID introduces a two-stage augmentation process combining geometric control and multi-agent reflection to enhance harmful text classification datasets.

Findings

01

Significant performance improvements on benchmark datasets.

02

Enhanced stylistic diversity and edge case coverage.

03

Effective addressing of data scarcity in harmful content detection.

Abstract

We address the problem of data scarcity in harmful text classification for guardrailing applications and introduce GRAID (Geometric and Reflective AI-Driven Data Augmentation), a novel pipeline that leverages Large Language Models (LLMs) for dataset augmentation. GRAID consists of two stages: (i) generation of geometrically controlled examples using a constrained LLM, and (ii) augmentation through a multi-agentic reflective process that promotes stylistic diversity and uncovers edge cases. This combination enables both reliable coverage of the input space and nuanced exploration of harmful content. Using two benchmark data sets, we demonstrate that augmenting a harmful text classification dataset with GRAID leads to significant improvements in downstream guardrail model performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection· underline