Hacking Back the AI-Hacker: Prompt Injection as a Defense Against   LLM-driven Cyberattacks

Dario Pasquini; Evgenios M. Kornaropoulos; Giuseppe Ateniese

arXiv:2410.20911·cs.CR·November 19, 2024

Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks

Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese

PDF

Open Access 1 Repo

TL;DR

This paper presents Mantis, a novel defense framework that leverages prompt injection techniques to autonomously disrupt or compromise malicious LLM-driven cyberattacks, achieving over 95% effectiveness.

Contribution

Introducing Mantis, the first system that uses adversarial prompt injections to actively defend against and hack back LLM-based cyberattacks.

Findings

01

Achieved over 95% success rate in experiments

02

Effectively disrupts malicious LLM operations

03

Can autonomously hack back attackers

Abstract

Large language models (LLMs) are increasingly being harnessed to automate cyberattacks, making sophisticated exploits more accessible and scalable. In response, we propose a new defense strategy tailored to counter LLM-driven cyberattacks. We introduce Mantis, a defensive framework that exploits LLMs' susceptibility to adversarial inputs to undermine malicious operations. Upon detecting an automated cyberattack, Mantis plants carefully crafted inputs into system responses, leading the attacker's LLM to disrupt their own operations (passive defense) or even compromise the attacker's machine (active defense). By deploying purposefully vulnerable decoy services to attract the attacker and using dynamic prompt injections for the attacker's LLM, Mantis can autonomously hack back the attacker. In our experiments, Mantis consistently achieved over 95% effectiveness against automated LLM-driven…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pasquini-dario/project_mantis
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Digital and Cyber Forensics