Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

Anirudh Sekar; Mrinal Agarwal; Rachel Sharma; Akitsugu Tanaka; Jasmine Zhang; Arjun Damerla; Kevin Zhu

arXiv:2601.12359·cs.CR·January 21, 2026

Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

Anirudh Sekar, Mrinal Agarwal, Rachel Sharma, Akitsugu Tanaka, Jasmine Zhang, Arjun Damerla, Kevin Zhu

PDF

Open Access

TL;DR

This paper introduces ZEDD, a zero-shot, lightweight embedding drift detection method that effectively identifies prompt injection attacks in LLMs without model access or retraining, enhancing security across diverse architectures.

Contribution

The paper proposes ZEDD, a novel zero-shot detection framework based on semantic embedding drift, capable of identifying prompt injections without prior attack knowledge or model internals.

Findings

01

Achieves over 93% accuracy in detecting prompt injections.

02

Operates with less than 3% false positive rate.

03

Outperforms traditional detection methods in efficiency and accuracy.

Abstract

Prompt injection attacks have become an increasing vulnerability for LLM applications, where adversarial prompts exploit indirect input channels such as emails or user-generated content to circumvent alignment safeguards and induce harmful or unintended outputs. Despite advances in alignment, even state-of-the-art LLMs remain broadly vulnerable to adversarial prompts, underscoring the urgent need for robust, productive, and generalizable detection mechanisms beyond inefficient, model-specific patches. In this work, we propose Zero-Shot Embedding Drift Detection (ZEDD), a lightweight, low-engineering-overhead framework that identifies both direct and indirect prompt injection attempts by quantifying semantic shifts in embedding space between benign and suspect inputs. ZEDD operates without requiring access to model internals, prior knowledge of attack types, or task-specific retraining,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Advanced Malware Detection Techniques · Adversarial Robustness in Machine Learning