A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties
Jinghao Wang, Ping Zhang, Carter Yagemann

TL;DR
This paper introduces a practical, reproducible framework for evaluating the security of medical AI models, focusing on robustness against jailbreaking and privacy attacks across various clinical specialties, using accessible resources.
Contribution
It provides a comprehensive, resource-efficient evaluation framework that enables community-wide assessment of medical AI security without requiring expensive hardware or sensitive data.
Findings
Framework supports multiple specialties and attack types.
Evaluation runs on consumer hardware with synthetic data.
Establishes a foundation for comparative security analysis.
Abstract
Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU clusters, commercial API access, or protected health data -- barriers that limit community participation in this critical research area. We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints. Our framework design covers multiple medical specialties stratified by clinical risk -- from high-risk domains such as emergency medicine and psychiatry to general practice -- addressing jailbreaking attacks (role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. All evaluation utilizes synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · Information and Cyber Security
