Calibrated? Not for Everyone: How Sexual Orientation and Religious Markers Distort LLM Accuracy and Confidence in Medical QA

Alberto Testoni; Iacer Calixto

arXiv:2604.17316·cs.CL·April 21, 2026

Calibrated? Not for Everyone: How Sexual Orientation and Religious Markers Distort LLM Accuracy and Confidence in Medical QA

Alberto Testoni, Iacer Calixto

PDF

TL;DR

This study reveals that social identity markers like sexual orientation and religious affiliation significantly impair the accuracy and confidence calibration of large language models in medical question answering, risking unsafe clinical deployment.

Contribution

It uncovers how social descriptors distort LLM calibration and accuracy, highlighting risks in equitable healthcare AI deployment.

Findings

01

Identity markers cause performance drops in LLMs.

02

Intersectional identities produce non-additive calibration harms.

03

Failures persist in open-ended generation settings.

Abstract

Safe clinical deployment of Large Language Models (LLMs) requires not only high accuracy but also robust uncertainty calibration to ensure models defer to clinicians when appropriate. Our paper investigates how social descriptors of a patient (specifically sexual orientation and religious affiliation) distort these uncertainty signals and model accuracy. Evaluating nine general-purpose and biomedical LLMs on 2,364 medical questions and their counterfactual variants, we demonstrate that identity markers cause a "calibration crisis". "Homosexual" markers consistently trigger performance drops, and intersectional identities produce idiosyncratic, non-additive harms to calibration. Moreover, a clinician-validated case study in an open-ended generation setting confirms that these failures are not an artifact of the multiple-choice format. Our results demonstrate that the presence of social…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.