MPIB: A Benchmark for Medical Prompt Injection Attacks and Clinical Safety in LLMs
Junhyeok Lee, Han Jang, and Kyu Sung Choi

TL;DR
This paper introduces MPIB, a comprehensive benchmark dataset designed to evaluate the safety of Large Language Models in clinical settings against prompt injection attacks, emphasizing real-world clinical harm risks.
Contribution
MPIB is the first benchmark to evaluate clinical safety of LLMs against prompt injection, incorporating outcome-level risk metrics and diverse attack scenarios.
Findings
ASR and CHER can diverge significantly in evaluations.
Robustness varies depending on whether adversarial prompts are in user query or retrieved context.
Evaluation reveals critical vulnerabilities in current LLM defenses.
Abstract
Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly integrated into clinical workflows; however, prompt injection attacks can steer these systems toward clinically unsafe or misleading outputs. We introduce the Medical Prompt Injection Benchmark (MPIB), a dataset-and-benchmark suite for evaluating clinical safety under both direct prompt injection and indirect, RAG-mediated injection across clinically grounded tasks. MPIB emphasizes outcome-level risk via the Clinical Harm Event Rate (CHER), which measures high-severity clinical harm events under a clinically grounded taxonomy, and reports CHER alongside Attack Success Rate (ASR) to disentangle instruction compliance from downstream patient risk. The benchmark comprises 9,697 curated instances constructed through multi-stage quality gates and clinical safety linting. Evaluating MPIB across a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare
