Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare
Pardis Sadat Zahraei, Zahra Shakeri

TL;DR
This paper introduces new datasets and a fine-tuned model to detect and reduce biases in large language models for healthcare, significantly improving diagnostic accuracy and ethical reasoning.
Contribution
It presents two datasets for bias evaluation and diagnostic assessment, and introduces EthiClinician, a fine-tuned model that surpasses GPT-4 in ethical and clinical performance.
Findings
EthiClinician outperforms GPT-4 in ethical reasoning.
The BiasMD dataset effectively evaluates biases in healthcare LLMs.
The DiseaseMatcher dataset assesses diagnostic accuracy across 700 diseases.
Abstract
Biased AI-generated medical advice and misdiagnoses can jeopardize patient safety, making the integrity of AI in healthcare more critical than ever. As Large Language Models (LLMs) take on a growing role in medical decision-making, addressing their biases and enhancing their accuracy is key to delivering safe, reliable care. This study addresses these challenges head-on by introducing new resources designed to promote ethical and precise AI in healthcare. We present two datasets: BiasMD, featuring 6,007 question-answer pairs crafted to evaluate and mitigate biases in health-related LLM outputs, and DiseaseMatcher, with 32,000 clinical question-answer pairs spanning 700 diseases, aimed at assessing symptom-based diagnostic accuracy. Using these datasets, we developed the EthiClinician, a fine-tuned model built on the ChatDoctor framework, which outperforms GPT-4 in both ethical reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Artificial Intelligence in Healthcare · Machine Learning in Healthcare
MethodsDense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Attention Is All You Need · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings
