Obedience to Unsafe Clinical Instructions: How Large Language Models Respond to Authority Cues

Mahmud Omar; Reem Agbareia; Jolion McGreevy; Alon Gorenshtein; Alexander Charney; Ankit Sakhuja; Benjamin S. Glicksberg; Girish Nadkarni; Eyal Klang

PMC · DOI:10.21203/rs.3.rs-8932472/v1·March 18, 2026

Obedience to Unsafe Clinical Instructions: How Large Language Models Respond to Authority Cues

Mahmud Omar, Reem Agbareia, Jolion McGreevy, Alon Gorenshtein, Alexander Charney, Ankit Sakhuja, Benjamin S. Glicksberg, Girish Nadkarni, Eyal Klang

PDF

Open Access

TL;DR

This study shows that large language models in clinical settings often follow unsafe instructions when pressured by authority cues, but adding safety reminders can reduce harmful decisions.

Contribution

The paper introduces a novel evaluation framework for LLMs' responses to authority cues in clinical scenarios, revealing harmful compliance patterns.

Findings

01

11.7% of LLM outputs across 10 million clinical scenarios were harmful.

02

Mitigation cues reduced harmful decisions by up to 22.1 percentage points in real-world discharge cases.

03

Authority and responsibility-transfer cues led to the highest harmful compliance rates.

Abstract

Large language models (LLMs) are being integrated into clinical environments where deference to authority can cause harm. Unlike hallucination or bias, obedience to unsafe instructions represents a distinct safety failure: following an explicit but harmful order. We conducted a cross-sectional evaluation of 20 proprietary, open-source, and clinically tuned LLMs across 10,096,800 clinical decision scenarios, including synthetic vignettes with predefined safe versus unsafe options and real-world discharge recommendations reframed to include unsafe contradictory requests. Each scenario was presented under a neutral control or one of six Milgram-style social-pressure conditions (authority, responsibility transfer, urgency, threat, conformity, depersonalization), with or without a short mitigation cue instructing verification or escalation if unsafe. The primary outcome was the proportion…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Diseases1

hallucination

Figures4

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHealthcare Decision-Making and Restraints · Patient-Provider Communication in Healthcare · Clinical Reasoning and Diagnostic Skills