LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks

Francesco Panebianco; Stefano Bonfanti; Francesco Trov\`o; Michele Carminati

arXiv:2508.00602·cs.CR·August 4, 2025

LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks

Francesco Panebianco, Stefano Bonfanti, Francesco Trov\`o, Michele Carminati

PDF

Open Access

TL;DR

LeakSealer is a semi-supervised, model-agnostic framework that enhances LLM security by analyzing interaction data, detecting prompt injection and leakage attacks through static and dynamic methods, and providing forensic insights.

Contribution

The paper introduces LeakSealer, a novel semi-supervised defense framework combining static analysis and dynamic detection with forensic analysis for LLM security.

Findings

01

LeakSealer achieves high precision and recall in prompt injection detection.

02

LeakSealer detects PII leakage with an AUPRC of 0.97, outperforming baselines.

03

The approach provides forensic insights into attack evolution.

Abstract

The generalization capabilities of Large Language Models (LLMs) have led to their widespread deployment across various applications. However, this increased adoption has introduced several security threats, notably in the forms of jailbreaking and data leakage attacks. Additionally, Retrieval Augmented Generation (RAG), while enhancing context-awareness in LLM responses, has inadvertently introduced vulnerabilities that can result in the leakage of sensitive information. Our contributions are twofold. First, we introduce a methodology to analyze historical interaction data from an LLM system, enabling the generation of usage maps categorized by topics (including adversarial interactions). This approach further provides forensic insights for tracking the evolution of jailbreaking attack patterns. Second, we propose LeakSealer, a model-agnostic framework that combines static analysis for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing