Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

Manish Bhatt; Adrian Wood; Idan Habler; Ammar Al-Kahfah

arXiv:2601.00042·cs.CR·January 7, 2026

Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing

Manish Bhatt, Adrian Wood, Idan Habler, Ammar Al-Kahfah

PDF

Open Access

TL;DR

This study adapts the Go-Explore algorithm for security testing of AI agents with tool use, revealing seed variance impacts, the effects of reward shaping, and the benefits of ensembles for comprehensive evaluation.

Contribution

It introduces an adaptation of Go-Explore for AI safety testing, highlighting the importance of seed variance, simple state signatures, and ensemble methods in evaluating large language models.

Findings

01

Seed variance causes up to 8x outcome spread.

02

Reward shaping often harms exploration, causing collapse or false positives.

03

Ensembles increase attack diversity, single agents improve attack coverage.

Abstract

Production LLM agents with tool-using capabilities require security testing despite their safety training. We adapt Go-Explore to evaluate GPT-4o-mini across 28 experimental runs spanning six research questions. We find that random-seed variance dominates algorithmic parameters, yielding an 8x spread in outcomes; single-seed comparisons are unreliable, while multi-seed averaging materially reduces variance in our setup. Reward shaping consistently harms performance, causing exploration collapse in 94% of runs or producing 18 false positives with zero verified attacks. In our environment, simple state signatures outperform complex ones. For comprehensive security testing, ensembles provide attack-type diversity, whereas single agents optimize coverage within a given attack type. Overall, these results suggest that seed variance and targeted domain knowledge can outweigh algorithmic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques