Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs
Andrew Maranh\~ao Ventura D'addario

TL;DR
This paper introduces Medical Malice, a large dataset of adversarial prompts tailored to healthcare contexts, designed to improve safety and ethical compliance of LLMs in medical environments by capturing nuanced, system-specific violations.
Contribution
The paper presents Medical Malice, a novel dataset of 214,219 context-aware adversarial prompts with reasoning, enabling models to internalize ethical boundaries specific to healthcare systems.
Findings
Created a large, context-specific adversarial dataset for healthcare LLM safety
Synthesized high-fidelity threats across seven healthcare-related categories
Advocated for a shift to context-aware safety in medical AI systems
Abstract
The integration of Large Language Models (LLMs) into healthcare demands a safety paradigm rooted in \textit{primum non nocere}. However, current alignment techniques rely on generic definitions of harm that fail to capture context-dependent violations, such as administrative fraud and clinical discrimination. To address this, we introduce Medical Malice: a dataset of 214,219 adversarial prompts calibrated to the regulatory and ethical complexities of the Brazilian Unified Health System (SUS). Crucially, the dataset includes the reasoning behind each violation, enabling models to internalize ethical boundaries rather than merely memorizing a fixed set of refusals. Using an unaligned agent (Grok-4) within a persona-driven pipeline, we synthesized high-fidelity threats across seven taxonomies, ranging from procurement manipulation and queue-jumping to obstetric violence. We discuss the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Adversarial Robustness in Machine Learning
