Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems
Andrii Balashov, Olena Ponomarova, Xiaohua Zhai

TL;DR
This paper investigates multi-stage prompt inference attacks on enterprise LLMs, demonstrating their ability to extract sensitive data despite safety measures, and proposes comprehensive defenses including anomaly detection and input transformations.
Contribution
It introduces a formal threat model for multi-stage inference attacks on enterprise LLMs and evaluates multiple defense strategies with theoretical and experimental support.
Findings
Attacks can reliably exfiltrate sensitive enterprise data
Anomaly detection achieves high detection accuracy (AUC)
Input transformations significantly reduce attack success
Abstract
Large Language Models (LLMs) deployed in enterprise settings (e.g., as Microsoft 365 Copilot) face novel security challenges. One critical threat is prompt inference attacks: adversaries chain together seemingly benign prompts to gradually extract confidential data. In this paper, we present a comprehensive study of multi-stage prompt inference attacks in an enterprise LLM context. We simulate realistic attack scenarios where an attacker uses mild-mannered queries and indirect prompt injections to exploit an LLM integrated with private corporate data. We develop a formal threat model for these multi-turn inference attacks and analyze them using probability theory, optimization frameworks, and information-theoretic leakage bounds. The attacks are shown to reliably exfiltrate sensitive information from the LLM's context (e.g., internal SharePoint documents or emails), even when standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
