Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards

Philipp Normann; Andreas Happe; J\"urgen Cito; Daniel Arp

arXiv:2603.17673·cs.CR·March 19, 2026

Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards

Philipp Normann, Andreas Happe, J\"urgen Cito, Daniel Arp

PDF

Open Access

TL;DR

This paper introduces a two-stage post-training method for small, local LLMs to effectively perform Linux privilege escalation, achieving high success rates with verifiable rewards and significantly reduced inference costs.

Contribution

It presents a novel two-stage post-training pipeline combining supervised fine-tuning and reinforcement learning for security tasks on small models.

Findings

01

Supervised fine-tuning more than doubles baseline success rate.

02

Reinforcement learning boosts success rate to 95.8%.

03

Inference cost per successful escalation reduced by over 100x.

Abstract

LLM agents are increasingly relevant to research domains such as vulnerability discovery. Yet, the strongest systems remain closed and cloud-only, making them resource-intensive, difficult to reproduce, and unsuitable for work involving proprietary code or sensitive data. Consequently, there is an urgent need for small, local models that can perform security tasks under strict resource budgets, but methods for developing them remain underexplored. In this paper, we address this gap by proposing a two-stage post-training pipeline. We focus on the problem of Linux privilege escalation, where success is automatically verifiable and the task requires multi-step interactive reasoning. Using an experimental setup that prevents data leakage, we post-train a 4B model in two stages: supervised fine-tuning on traces from procedurally generated privilege-escalation environments, followed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Web Application Security Vulnerabilities · Information and Cyber Security