Towards Fairer Health Recommendations: finding informative unbiased   samples via Word Sense Disambiguation

Gavin Butts; Pegah Emdad; Jethro Lee; Shannon Song; Chiman Salavati,; Willmar Sosa Diaz; Shiri Dori-Hacohen; Fabricio Murai

arXiv:2409.07424·cs.CL·September 12, 2024

Towards Fairer Health Recommendations: finding informative unbiased samples via Word Sense Disambiguation

Gavin Butts, Pegah Emdad, Jethro Lee, Shannon Song, Chiman Salavati,, Willmar Sosa Diaz, Shiri Dori-Hacohen, Fabricio Murai

PDF

Open Access

TL;DR

This paper investigates bias detection in medical NLP data, proposing the use of Word Sense Disambiguation to improve dataset quality, and evaluates various models showing fine-tuned BERTs outperform LLMs in bias detection tasks.

Contribution

It introduces a novel application of Word Sense Disambiguation to refine bias detection datasets in medical NLP and compares model performances for this task.

Findings

01

LLMs are unsuitable for bias detection in medical texts.

02

Fine-tuned BERT models perform well across metrics.

03

WSD improves dataset quality by removing irrelevant sentences.

Abstract

There have been growing concerns around high-stake applications that rely on models trained with biased data, which consequently produce biased predictions, often harming the most vulnerable. In particular, biased medical data could cause health-related applications and recommender systems to create outputs that jeopardize patient care and widen disparities in health outcomes. A recent framework titled Fairness via AI posits that, instead of attempting to correct model biases, researchers must focus on their root causes by using AI to debias data. Inspired by this framework, we tackle bias detection in medical curricula using NLP models, including LLMs, and evaluate them on a gold standard dataset containing 4,105 excerpts annotated by medical experts for bias from a large corpus. We build on previous work by coauthors which augments the set of negative samples with non-annotated text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Biomedical Text Mining and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Sparse Evolutionary Training · WordPiece · Residual Connection · Attention Dropout · Linear Layer · Discriminative Fine-Tuning · Multi-Head Attention · Linear Warmup With Linear Decay · Cosine Annealing