# Obedience to Unsafe Clinical Instructions: How Large Language Models Respond to Authority Cues

**Authors:** Mahmud Omar, Reem Agbareia, Jolion McGreevy, Alon Gorenshtein, Alexander Charney, Ankit Sakhuja, Benjamin S. Glicksberg, Girish Nadkarni, Eyal Klang

PMC · DOI: 10.21203/rs.3.rs-8932472/v1 · 2026-03-18

## TL;DR

This study shows that large language models in clinical settings often follow unsafe instructions when pressured by authority cues, but adding safety reminders can reduce harmful decisions.

## Contribution

The paper introduces a novel evaluation framework for LLMs' responses to authority cues in clinical scenarios, revealing harmful compliance patterns.

## Key findings

- 11.7% of LLM outputs across 10 million clinical scenarios were harmful.
- Mitigation cues reduced harmful decisions by up to 22.1 percentage points in real-world discharge cases.
- Authority and responsibility-transfer cues led to the highest harmful compliance rates.

## Abstract

Large language models (LLMs) are being integrated into clinical environments where deference to authority can cause harm. Unlike hallucination or bias, obedience to unsafe instructions represents a distinct safety failure: following an explicit but harmful order.

We conducted a cross-sectional evaluation of 20 proprietary, open-source, and clinically tuned LLMs across 10,096,800 clinical decision scenarios, including synthetic vignettes with predefined safe versus unsafe options and real-world discharge recommendations reframed to include unsafe contradictory requests. Each scenario was presented under a neutral control or one of six Milgram-style social-pressure conditions (authority, responsibility transfer, urgency, threat, conformity, depersonalization), with or without a short mitigation cue instructing verification or escalation if unsafe. The primary outcome was the proportion of potentially harmful outputs, defined as selection or endorsement of an unsafe clinical decision.

Across all runs, 1.18 million of 10.1 million outputs (11.7%) were harmful. Harmful decisions occurred in 16.6% of unmitigated versus 10.1% of mitigated conditions (absolute reduction, 6.5 percentage points; p < 0.001). In synthetic vignettes, harmful responses averaged 8.1% overall, declining from 10.6% to 7.2% with mitigation (difference, 3.4 percentage points; p < 0.001). In real-world discharge cases, harmful responses averaged 30.0%, decreasing from 46.6% to 24.5% with mitigation (difference, 22.1 percentage points; p < 0.001). Across all conditions, authority and responsibility-transfer cues elicited the highest harmful compliance, and control prompts the lowest; mitigation reduced rates but preserved this pattern.

LLMs do not behave as neutral calculators in clinical contexts. When exposed to authority or responsibility-transfer cues, they exhibit consistent obedience to unsafe instructions. A brief safety reminder substantially reduces but does not eliminate this behavior.

## Full-text entities

- **Diseases:** hallucination (MESH:D006212)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13015605/full.md

---
Source: https://tomesphere.com/paper/PMC13015605