PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI
Haitham S. Al-Sinani, Chris J. Mitchell

TL;DR
PenTest2.0 advances autonomous ethical hacking by enabling AI-driven privilege escalation through large language models, incorporating retrieval-augmented generation, reasoning prompts, and goal tracking, demonstrating multi-turn adaptive attacks on Linux systems.
Contribution
It introduces PenTest2.0, a novel system that automates privilege escalation using generative AI, with new features like retrieval-augmented generation and task trees, improving scalability and effectiveness.
Findings
Successfully performed multi-turn privilege escalation on Linux
Demonstrated benefits of retrieval-augmented generation and reasoning prompts
Identified limitations of generative AI in ethical hacking contexts
Abstract
Ethical hacking today relies on highly skilled practitioners executing complex sequences of commands, which is inherently time-consuming, difficult to scale, and prone to human error. To help mitigate these limitations, we previously introduced 'PenTest++', an AI-augmented system combining automation with generative AI supporting ethical hacking workflows. However, a key limitation of PenTest++ was its lack of support for privilege escalation, a crucial element of ethical hacking. In this paper we present 'PenTest2.0', a substantial evolution of PenTest++ supporting automated privilege escalation driven entirely by Large Language Model reasoning. It also incorporates several significant enhancements: 'Retrieval-Augmented Generation', including both one-line and offline modes; 'Chain-of-Thought' prompting for intermediate reasoning; persistent 'PenTest Task Trees' to track goal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Scientific Computing and Data Management · Topic Modeling
