Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs
Anirudh Sekar, Mrinal Agarwal, Rachel Sharma, Akitsugu Tanaka, Jasmine Zhang, Arjun Damerla, Kevin Zhu

TL;DR
This paper introduces ZEDD, a zero-shot, lightweight embedding drift detection method that effectively identifies prompt injection attacks in LLMs without model access or retraining, enhancing security across diverse architectures.
Contribution
The paper proposes ZEDD, a novel zero-shot detection framework based on semantic embedding drift, capable of identifying prompt injections without prior attack knowledge or model internals.
Findings
Achieves over 93% accuracy in detecting prompt injections.
Operates with less than 3% false positive rate.
Outperforms traditional detection methods in efficiency and accuracy.
Abstract
Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such as emails or user-generated content to circumvent alignment safeguards and induce harmful or unintended outputs. Despite advances in alignment, even state-of-the-art LLMs remain broadly vulnerable to adversarial prompts, underscoring the urgent need for robust, productive, and generalizable detection mechanisms beyond inefficient, model-specific patches. In this work, we propose Zero-Shot Embedding Drift Detection (ZEDD), a lightweight, low-engineering-overhead framework that identifies both direct and indirect prompt injection attempts by quantifying semantic shifts in embedding space between benign and suspect inputs. ZEDD operates without requiring access to model internals, prior knowledge of attack types, or task-specific retraining,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Advanced Malware Detection Techniques · Adversarial Robustness in Machine Learning
