Cochise: A Reference Harness for Autonomous Penetration Testing

Andreas Happe; J\"urgen Cito

arXiv:2605.11671·cs.CR·May 13, 2026

Cochise: A Reference Harness for Autonomous Penetration Testing

Andreas Happe, J\"urgen Cito

PDF

TL;DR

Cochise is a lightweight Python framework for autonomous penetration testing experiments, enabling controlled testing environments and analysis tools to evaluate LLM-driven agents without extensive infrastructure.

Contribution

It introduces a minimal, reusable reference harness with a separated architecture and analysis tools, facilitating systematic comparison of penetration-testing agents.

Findings

01

Successfully tested against the GOAD testbed

02

Provides tools for offline visualization and analysis of agent runs

03

Enables comparison of different models and architectures in penetration testing

Abstract

Recent work on LLM-driven autonomous penetration testing reports promising results, but existing systems often combine many architectural, prompting, and tool-integration choices, making it difficult to tell what is gained over a simple agent scaffold. We present cochise, a 597 LOC Python reference harness for autonomous penetration-testing experiments. Cochise connects an LLM-driven agent to a Linux execution host over SSH and supports controlled target environments reachable from that jump host. The prototype implements a separated Planner--Executor architecture in which long-term state is maintained outside the LLM context, while a ReAct-style executor issues commands over SSH and self-corrects based on command outputs. The scenario prompt can be adapted to different target environments. To demonstrate the efficacy of our minimal harness, we evaluate it against a live third-party…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.