OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence
Jarrod Barnes

TL;DR
OpenSec introduces a new RL environment to evaluate incident response agents' calibration under adversarial evidence, revealing significant over-triggering and calibration gaps in frontier models like GPT-5.2 and Claude Sonnet 4.5.
Contribution
It provides a realistic benchmark for IR agents that isolates calibration failures under adversarial prompt injections, a gap in existing evaluation methods.
Findings
GPT-5.2 triggers containment in all episodes with high false positives.
Claude Sonnet 4.5 shows partial calibration with lower false positives.
Calibration issues are in restraint, not threat detection.
Abstract
As large language models (LLMs) improve, so do their offensive applications: frontier agents now generate working exploits for under $50 in compute (Heelan, 2026). Defensive incident response (IR) agents must keep pace, but existing benchmarks conflate action execution with correct execution, hiding calibration failures when agents process adversarial evidence. We introduce OpenSec, a dual-control reinforcement learning (RL) environment that evaluates IR agents under realistic prompt injection scenarios with execution-based scoring: time-to-first-containment (TTFC), evidence-gated action rate (EGAR), blast radius, and per-tier injection violation rates. Evaluating four frontier models on 40 standard-tier episodes each, we find consistent over-triggering: GPT-5.2 executes containment in 100% of episodes with 82.5% false positive rate, acting at step 4 before gathering sufficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
