SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning

Kaiwen Zhou; Ahmed Elgohary; A S M Iftekhar; Amin Saied

arXiv:2510.26037·cs.CR·October 31, 2025

SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning

Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar, Amin Saied

PDF

1 Video

TL;DR

SIRAJ is a comprehensive red-teaming framework for LLM agents that generates diverse risk scenarios, refines adversarial attacks iteratively, and employs model distillation to create efficient, high-performing smaller red-teaming models.

Contribution

The paper introduces a novel dynamic two-step red-teaming process with structured reasoning and a distillation approach for efficient, effective LLM safety testing.

Findings

01

Seed test case generation increases risk coverage by 2-2.5x.

02

Distilled 8B red-teamer improves attack success rate by 100%.

03

Framework effectively generalizes across diverse LLM settings.

Abstract

The ability of LLM agents to plan and invoke tools exposes them to new safety risks, making a comprehensive red-teaming system crucial for discovering vulnerabilities and ensuring their safe deployment. We present SIRAJ: a generic red-teaming framework for arbitrary black-box LLM agents. We employ a dynamic two-step process that starts with an agent definition and generates diverse seed test cases that cover various risk outcomes, tool-use trajectories, and risk sources. Then, it iteratively constructs and refines model-based adversarial attacks based on the execution trajectories of former attempts. To optimize the red-teaming cost, we present a model distillation approach that leverages structured forms of a teacher model's reasoning to train smaller models that are equally effective. Across diverse evaluation agent settings, our seed test case generation approach yields 2 -- 2.5x…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning· underline