Fine-Tune, Don't Prompt, Your Language Model to Identify Biased Language in Clinical Notes
Isotta Landi, Eugenia Alleva, Nicole Bussola, Rebecca M. Cohen, Sarah Nowlin, Leslee J. Shaw, Alexander W. Charney, Kimberly B. Glazer

TL;DR
This study develops a fine-tuning approach for language models to detect biased language in clinical notes, outperforming prompting methods and emphasizing the need for specialty-specific adaptation to ensure accuracy and clinical relevance.
Contribution
It introduces a lexicon-based framework and demonstrates that fine-tuning with lexically primed inputs yields superior bias detection in clinical texts compared to prompting methods.
Findings
Fine-tuning outperforms prompting in bias classification.
GatorTron achieves 0.96 F1 score on OB-GYN data.
Cross-domain generalizability is limited without domain-specific training.
Abstract
Clinical documentation can contain emotionally charged language with stigmatizing or privileging valences. We present a framework for detecting and classifying such language as stigmatizing, privileging, or neutral. We constructed a curated lexicon of biased terms scored for emotional valence. We then used lexicon-based matching to extract text chunks from OB-GYN delivery notes (Mount Sinai Hospital, NY) and MIMIC-IV discharge summaries across multiple specialties. Three clinicians annotated all chunks, enabling characterization of valence patterns across specialties and healthcare systems. We benchmarked multiple classification strategies (zero-shot prompting, in-context learning, and supervised fine-tuning) across encoder-only models (GatorTron) and generative large language models (Llama). Fine-tuning with lexically primed inputs consistently outperformed prompting approaches.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Artificial Intelligence in Healthcare and Education
