Towards Optimal Agentic Architectures for Offensive Security Tasks

Isaac David; Arthur Gervais

arXiv:2604.18718·cs.CR·April 22, 2026

Towards Optimal Agentic Architectures for Offensive Security Tasks

Isaac David, Arthur Gervais

PDF

TL;DR

This paper evaluates how different agent architectures and coordination topologies affect the effectiveness and cost-efficiency of AI systems auditing security vulnerabilities across various targets and modes.

Contribution

It introduces a benchmark and empirical study analyzing the impact of topology choices on security task performance and cost, revealing non-monotonic cost-quality trade-offs.

Findings

01

Whitebox mode significantly outperforms blackbox in detection rates.

02

Broader coordination can improve coverage but has diminishing returns considering costs.

03

Web targets are easier to detect than binary targets.

Abstract

Agentic security systems increasingly audit live targets with tool-using LLMs, but prior systems fix a single coordination topology, leaving unclear when additional agents help and when they only add cost. We treat topology choice as an empirical systems question. We introduce a controlled benchmark of 20 interactive targets (10 web/API and 10 binary), each exposing one endpoint-reachable ground-truth vulnerability, evaluated in whitebox and blackbox modes. The core study executes 600 runs over five architecture families, three model families, and both access modes, with a separate 60-run long-context pilot reported only in the appendix. On the completed core benchmark, detection-any reaches 58.0% and validated detection reaches 49.8%. MAS-Indep attains the highest validated detection rate (64.2%), while SAS is the strongest efficiency baseline at $0.058 per validated finding. Whitebox…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.