LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks
Francesco Panebianco, Stefano Bonfanti, Francesco Trov\`o, Michele Carminati

TL;DR
LeakSealer is a semi-supervised, model-agnostic framework that enhances LLM security by analyzing interaction data, detecting prompt injection and leakage attacks through static and dynamic methods, and providing forensic insights.
Contribution
The paper introduces LeakSealer, a novel semi-supervised defense framework combining static analysis and dynamic detection with forensic analysis for LLM security.
Findings
LeakSealer achieves high precision and recall in prompt injection detection.
LeakSealer detects PII leakage with an AUPRC of 0.97, outperforming baselines.
The approach provides forensic insights into attack evolution.
Abstract
The generalization capabilities of Large Language Models (LLMs) have led to their widespread deployment across various applications. However, this increased adoption has introduced several security threats, notably in the forms of jailbreaking and data leakage attacks. Additionally, Retrieval Augmented Generation (RAG), while enhancing context-awareness in LLM responses, has inadvertently introduced vulnerabilities that can result in the leakage of sensitive information. Our contributions are twofold. First, we introduce a methodology to analyze historical interaction data from an LLM system, enabling the generation of usage maps categorized by topics (including adversarial interactions). This approach further provides forensic insights for tracking the evolution of jailbreaking attack patterns. Second, we propose LeakSealer, a model-agnostic framework that combines static analysis for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing
