LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild
Reworr, Dmitrii Volkov

TL;DR
This paper introduces LLM Honeypot, a system that monitors and detects malicious AI hacking agents in the wild by analyzing large-scale attack data collected over three months.
Contribution
The paper presents a novel honeypot system augmented with prompt injection and timing analysis to identify AI-driven hacking agents among general attackers.
Findings
Collected over 8 million hacking attempts in three months
Detected 8 potential AI hacking agents
Demonstrated emergence of AI-driven cybersecurity threats
Abstract
Attacks powered by Large Language Model (LLM) agents represent a growing threat to modern cybersecurity. To address this concern, we present LLM Honeypot, a system designed to monitor autonomous AI hacking agents. By augmenting a standard SSH honeypot with prompt injection and time-based analysis techniques, our framework aims to distinguish LLM agents among all attackers. Over a trial deployment of about three months in a public environment, we collected 8,130,731 hacking attempts and 8 potential AI agents. Our work demonstrates the emergence of AI-driven threats and their current level of usage, serving as an early warning of malicious LLM agents in the wild.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection
