Enhancing Linux Privilege Escalation Attack Capabilities of Local LLM Agents

Benjamin Probst; Andreas Happe; J\"urgen Cito

arXiv:2604.27143·cs.CR·May 1, 2026

Enhancing Linux Privilege Escalation Attack Capabilities of Local LLM Agents

Benjamin Probst, Andreas Happe, J\"urgen Cito

PDF

TL;DR

This paper systematically evaluates how targeted interventions can improve open-weight LLMs for Linux privilege escalation, achieving performance comparable to cloud-based models like GPT-4o.

Contribution

It introduces five concrete interventions to enhance open-weight models' effectiveness in privilege escalation, demonstrating significant performance gains.

Findings

01

Open-weight models can match or outperform cloud-based baselines.

02

Reflection-based treatments contribute most to performance improvements.

03

Vulnerability discovery remains a bottleneck for local models.

Abstract

Recent research has demonstrated the potential of Large Language Models (LLMs) for autonomous penetration testing, particularly when using cloud-based restricted-weight models. However, reliance on such models introduces security, privacy, and sovereignty concerns, motivating the use of locally hosted open-weight alternatives. Prior work shows that small open-weight models perform poorly on automated Linux privilege escalation, limiting their practical applicability. In this paper, we present a systematic empirical study of whether targeted system-level and prompting interventions can bridge this performance gap. We analyze failure modes of open-weight models in autonomous privilege escalation, map them to established enhancement techniques, and evaluate five concrete interventions (chain-of-thought prompting, retrieval-augmented generation, structured prompts, history compression,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.