Generative Artificial Intelligence-Supported Pentesting: A Comparison between Claude Opus, GPT-4, and Copilot
Antonio L\'opez Mart\'inez, Alejandro Cano, Antonio Ruiz-Mart\'inez

TL;DR
This paper compares Claude Opus, GPT-4, and Copilot in supporting penetration testing, showing that while they can't fully automate it, they improve efficiency and effectiveness, with Claude Opus performing best.
Contribution
It provides a systematic evaluation of leading GenAI tools in augmenting pentesting across all PTES phases, highlighting their strengths and limitations.
Findings
Claude Opus outperformed GPT-4 and Copilot in experiments.
All tools enhanced efficiency in specific pentesting tasks.
Tools cannot fully automate the pentesting process.
Abstract
The advent of Generative Artificial Intelligence (GenAI) has brought a significant change to our society. GenAI can be applied across numerous fields, with particular relevance in cybersecurity. Among the various areas of application, its use in penetration testing (pentesting) or ethical hacking processes is of special interest. In this paper, we have analyzed the potential of leading generic-purpose GenAI tools-Claude Opus, GPT-4 from ChatGPT, and Copilot-in augmenting the penetration testing process as defined by the Penetration Testing Execution Standard (PTES). Our analysis involved evaluating each tool across all PTES phases within a controlled virtualized environment. The findings reveal that, while these tools cannot fully automate the pentesting process, they provide substantial support by enhancing efficiency and effectiveness in specific tasks. Notably, all tools demonstrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAbsolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Multi-Head Attention · Position-Wise Feed-Forward Layer
