When Skills Don't Help: A Negative Result on Procedural Knowledge for Tool-Grounded Agents in Offensive Cybersecurity
Samuel Jacob Chacko, James Hugglestone, Chashi Mahiul Islam, Xiuwen Liu

TL;DR
This study shows that in offensive cybersecurity, procedural Skills provide minimal benefit when environment feedback is rich and immediate, challenging assumptions about their general usefulness in AI agents.
Contribution
The paper re-analyzes a controlled study, revealing that environment feedback reduces the effectiveness of Skills in tool-grounded agents within cybersecurity tasks.
Findings
Skills have negligible impact in cybersecurity tasks with rich environment feedback.
Marginal benefit of Skills diminishes as environment feedback bandwidth increases.
In some cases, Skills can actively degrade agent performance.
Abstract
Agent Skills, structured packages of procedural knowledge loaded into an LLM agent at inference time, are widely reported to improve task pass rates by an average of 16.2~percentage points across diverse domains. Yet the same benchmarks show wide variance, with 16 of 84 tasks suffering negative deltas when Skills are introduced. The community has not yet articulated a clean mechanism for \emph{when} Skills help and when they are merely redundant overhead. We re-analyze a recently published 180-run controlled study of an MCP-grounded autonomous Capture-the-Flag (CTF) agent under four documentation conditions of increasing richness (55, 1{,}478, 1{,}976, and 4{,}147 lines), and show that these conditions correspond almost exactly to a No-Skills, Experiential-Skills, Curated-Skills, and Comprehensive-Skills ablation. In offensive cybersecurity, a domain not deeply covered by existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
