Towards Automated Pentesting with Large Language Models

Ricardo Bessa; Rui Claro; Jo\~ao Trindade; Jo\~ao Louren\c{c}o

arXiv:2604.11772·cs.CR·April 14, 2026

Towards Automated Pentesting with Large Language Models

Ricardo Bessa, Rui Claro, Jo\~ao Trindade, Jo\~ao Louren\c{c}o

PDF

TL;DR

This paper introduces RedShell, a framework using fine-tuned LLMs to automate PowerShell-based pentesting on Windows, achieving high validity and semantic accuracy in generated malicious code snippets.

Contribution

RedShell is a novel, privacy-preserving framework that enhances automated pentesting with fine-tuned LLMs, outperforming existing methods in code validity and similarity metrics.

Findings

01

RedShell achieves over 90% syntactic validity in generated code.

02

RedShell's code snippets show above 50% average similarity to reference samples.

03

Functional tests confirm reliable execution of generated pentesting scripts.

Abstract

Large Language Models (LLMs) are redefining offensive cybersecurity by allowing the generation of harmful machine code with minimal human intervention. While attackers take advantage of dark LLMs such as XXXGPT and WolfGPT to produce malicious code, ethical hackers can follow similar approaches to automate traditional pentesting workflows. In this work, we present RedShell, a privacy-preserving, hardware-efficient framework that leverages fine-tuned LLMs to assist pentesters in generating offensive PowerShell code targeting Microsoft Windows vulnerabilities. RedShell was trained on a malicious PowerShell dataset from the literature, which we further enhanced with manually curated code samples. Experiments show that our framework achieves over 90% syntactic validity in generated samples and strong semantic alignment with reference pentesting snippets, outperforming state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.