Language Models Can Autonomously Hack and Self-Replicate

Alena Air; Reworr; Nikolaj Kotov; Dmitrii Volkov; John Steidley; Jeffrey Ladish

arXiv:2605.06760·cs.CR·May 11, 2026

Language Models Can Autonomously Hack and Self-Replicate

Alena Air, Reworr, Nikolaj Kotov, Dmitrii Volkov, John Steidley, Jeffrey Ladish

PDF

TL;DR

This paper shows that language models can autonomously exploit vulnerabilities to replicate and deploy themselves across networks, demonstrating a new level of autonomous hacking capability.

Contribution

It introduces the first demonstration of language models autonomously hacking, replicating, and deploying themselves, surpassing previous models in success rates across multiple vulnerability classes.

Findings

01

Qwen3.5-122B-A10B succeeds in 6-19% of attempts

02

Qwen3.6-27B reaches 33% success rate

03

Replicated models reach up to 81% success in chain replication

Abstract

We demonstrate that language models can autonomously replicate their weights and harness across a network by exploiting vulnerable hosts. The agent independently finds and exploits a web-application vulnerability, extracts credentials, and deploys an inference server with a copy of its harness and prompt on the compromised host. We test four vulnerability classes: hash bypass, server-side template injection, SQL injection, and broken access control. Qwen3.5-122B-A10B succeeds in 6-19% of attempts, and the smaller Qwen3.6-27B reaches 33% on a single A100. This already matches the current-generation GPT-5.4 and exceeds the prior-generation frontier, where Opus 4 reached 6% and GPT-5 reached 0%. Replicating Qwen weights, frontier models reach 81% (Opus 4.6) and 33% (GPT-5.4). This process chains: a successful replica can repeat it against a new target, producing additional copies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.