Cochise: A Reference Harness for Autonomous Penetration Testing
Andreas Happe, J\"urgen Cito

TL;DR
Cochise is a lightweight Python framework for autonomous penetration testing experiments, enabling controlled testing environments and analysis tools to evaluate LLM-driven agents without extensive infrastructure.
Contribution
It introduces a minimal, reusable reference harness with a separated architecture and analysis tools, facilitating systematic comparison of penetration-testing agents.
Findings
Successfully tested against the GOAD testbed
Provides tools for offline visualization and analysis of agent runs
Enables comparison of different models and architectures in penetration testing
Abstract
Recent work on LLM-driven autonomous penetration testing reports promising results, but existing systems often combine many architectural, prompting, and tool-integration choices, making it difficult to tell what is gained over a simple agent scaffold. We present cochise, a 597 LOC Python reference harness for autonomous penetration-testing experiments. Cochise connects an LLM-driven agent to a Linux execution host over SSH and supports controlled target environments reachable from that jump host. The prototype implements a separated Planner--Executor architecture in which long-term state is maintained outside the LLM context, while a ReAct-style executor issues commands over SSH and self-corrects based on command outputs. The scenario prompt can be adapted to different target environments. To demonstrate the efficacy of our minimal harness, we evaluate it against a live third-party…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
