Knowledge Return Oriented Prompting (KROP)
Jason Martin, Kenneth Yeung

TL;DR
This paper introduces KROP, a novel prompt injection technique that obfuscates malicious prompts, making them undetectable by existing prompt filtering and alignment defenses in large language models.
Contribution
KROP is the first method to effectively obfuscate prompt injections, enhancing the security of LLMs against prompt-based attacks.
Findings
KROP successfully evades most prompt detection mechanisms.
Obfuscation with KROP maintains the original prompt's functionality.
KROP demonstrates robustness across various LLM architectures.
Abstract
Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
