An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments
Hongjang Yang, Hyunsik Na, Daeseon Choi

TL;DR
This study investigates privacy leakage via prompt injection attacks in black-box chatbots, demonstrating how attackers can hijack tasks and exfiltrate data through crafted external content.
Contribution
It introduces a novel prompt-injection technique called exemplification and evaluates its effectiveness in privacy-leakage scenarios in black-box chatbot environments.
Findings
Prompt injection can hijack chatbot tasks effectively.
Exemplification technique outperforms prior fake-completion methods.
Proof-of-concept shows feasible data exfiltration in controlled settings.
Abstract
LLM-based chatbot agents increasingly process user requests by combining natural-language reasoning with external tools such as web browsing. These capabilities improve usability, but they also create attack surfaces when untrusted external content is processed as part of a user' s task. This paper studies a privacy-leakage attack chain based on indirect prompt injection in black-box chatbot environments, where the attacker has no access to model weights, system prompts, or agent implementation details including how a trajectory is actually managed during its processing for a query. We first analyze how an attacker can hijack an agent' s intended task by crafting external content that appears benign to the victim while inducing the agent to execute an attacker-defined objective. We then evaluate a new prompt-injection technique, called exemplification, which uses a bridge in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
