Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs

Andrew Maranh\~ao Ventura D'addario

arXiv:2511.21757·cs.CY·December 1, 2025

Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs

Andrew Maranh\~ao Ventura D'addario

PDF

Open Access 1 Datasets

TL;DR

This paper introduces Medical Malice, a large dataset of adversarial prompts tailored to healthcare contexts, designed to improve safety and ethical compliance of LLMs in medical environments by capturing nuanced, system-specific violations.

Contribution

The paper presents Medical Malice, a novel dataset of 214,219 context-aware adversarial prompts with reasoning, enabling models to internalize ethical boundaries specific to healthcare systems.

Findings

01

Created a large, context-specific adversarial dataset for healthcare LLM safety

02

Synthesized high-fidelity threats across seven healthcare-related categories

03

Advocated for a shift to context-aware safety in medical AI systems

Abstract

The integration of Large Language Models (LLMs) into healthcare demands a safety paradigm rooted in \textit{primum non nocere}. However, current alignment techniques rely on generic definitions of harm that fail to capture context-dependent violations, such as administrative fraud and clinical discrimination. To address this, we introduce Medical Malice: a dataset of 214,219 adversarial prompts calibrated to the regulatory and ethical complexities of the Brazilian Unified Health System (SUS). Crucially, the dataset includes the reasoning behind each violation, enabling models to internalize ethical boundaries rather than merely memorizing a fixed set of refusals. Using an unaligned agent (Grok-4) within a persona-driven pipeline, we synthesized high-fidelity threats across seven taxonomies, ranging from procurement manipulation and queue-jumping to obstetric violence. We discuss the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Larxel/medical-malice
dataset· 13 dl
13 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Adversarial Robustness in Machine Learning