Evaluating the Presence of Sex Bias in Clinical Reasoning by Large Language Models
Isabel Tsintsiper, Sheng Wong, Beth Albert, Shaun P Brennecke, Gabriel Davis Jones

TL;DR
This study systematically investigates sex bias in large language models used for clinical reasoning, revealing stable, model-specific biases that could impact healthcare decisions and emphasizing the need for careful oversight and auditing.
Contribution
It provides a comprehensive analysis of sex bias in multiple contemporary LLMs in clinical scenarios, highlighting the influence of model configuration on bias manifestation.
Findings
All models showed significant sex-assignment skew.
Model-specific biases were consistent across experiments.
Adjusting model settings did not eliminate diagnostic differences.
Abstract
Large language models (LLMs) are increasingly embedded in healthcare workflows for documentation, education, and clinical decision support. However, these systems are trained on large text corpora that encode existing biases, including sex disparities in diagnosis and treatment, raising concerns that such patterns may be reproduced or amplified. We systematically examined whether contemporary LLMs exhibit sex-specific biases in clinical reasoning and how model configuration influences these behaviours. We conducted three experiments using 50 clinician-authored vignettes spanning 44 specialties in which sex was non-informative to the initial diagnostic pathway. Four general-purpose LLMs (ChatGPT (gpt-4o-mini), Claude 3.7 Sonnet, Gemini 2.0 Flash and DeepSeekchat). All models demonstrated significant sex-assignment skew, with predicted sex differing by model. At temperature 0.5, ChatGPT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Sex and Gender in Healthcare
