# Detecting stigmatizing language in clinical notes with large language models for addiction care

**Authors:** Rohan Sethi, John Caskey, Yanjun Gao, Matthew M. Churpek, Timothy A. Miller, Anoop Mayampurath, Elizabeth Salisbury-Afshar, Majid Afshar, Dmitry Dligach

PMC · DOI: 10.1038/s44401-026-00069-0 · Npj Health Systems · 2026-02-02

## TL;DR

This study shows that large language models can accurately detect stigmatizing language in ICU clinical notes, especially for patients with substance use disorders.

## Contribution

The study introduces a high-accuracy method using supervised fine-tuning of LLMs to detect stigmatizing language in clinical notes.

## Key findings

- Supervised fine-tuning achieved 97.2% accuracy in identifying stigmatizing language in clinical notes.
- LLMs with in-context learning and SFT identified stigmatizing language missed during manual annotation.
- SFT achieved 97.9% accuracy on an external validation dataset from the University of Wisconsin Health System.

## Abstract

Intensive care units (ICU) produce numerous progress notes that may contain stigmatizing language that perpetuate negative biases and punitive approaches against patients. Patients with substance use disorders are particularly vulnerable to stigma. This study examined the performance of Large Language Models (LLMs) in the identification of stigmatizing language. We annotated a dataset with over 77,000 stigmatizing and non-stigmatizing notes from the MIMIC-III database. We utilized Meta’s Llama-3 8B Instruct LLM to run the following experiments for stigma detection: zero-shot; in-context learning; in-context learning with a selective retrieval; supervised fine-tuning (SFT); and keyword search. All approaches were evaluated on a held-out test set and external validation (University of Wisconsin Health System). SFT had the best performance with 97.2% accuracy, followed by in-context learning. The LLMs with in-context learning and SFT provided appropriate reasoning for false positives during human review. Both approaches identified clinical notes with stigmatizing language that were missed during annotation. SFT achieved 97.9% accuracy on external validation dataset. LLMs, particularly SFT and in-context learning, effectively identify stigmatizing language in ICU notes with high accuracy while explaining their reasoning in an asynchronous fashion and demonstrated the ability to identify novel stigmatizing language, not explicitly in training data nor existing guidelines.

## Full-text entities

- **Diseases:** addiction (MESH:D019966)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12864023/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12864023/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/PMC12864023/full.md

---
Source: https://tomesphere.com/paper/PMC12864023