CARE-SD: Classifier-based analysis for recognizing and eliminating stigmatizing and doubt marker labels in electronic health records: model development and validation
Drew Walker, Annie Thorne, Sudeshna Das, Jennifer Love, Hannah LF Cooper, Melvin Livingston III, Abeed Sarker

TL;DR
This paper develops and validates NLP classifiers to detect stigmatizing language and doubt markers in electronic health records, aiming to reduce bias and improve healthcare communication.
Contribution
It introduces a novel supervised classification approach using expanded lexicons and large-scale EHR data to identify stigmatizing and biased language.
Findings
High classifier performance with macro F1-scores of .84 and .79.
Lexicons with 58 doubt marker expressions and 127 stigmatizing labels.
Models closely match human annotator agreement with .87 accuracy.
Abstract
Objective: To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques. Materials and Methods: We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.5, and refined through human evaluation. These lexicons were used to search for matches across 18 million sentences from the de-identified Medical Information Mart for Intensive Care-III (MIMIC-III) dataset. For each linguistic bias feature, 1000 sentence matches were sampled, labeled by expert clinical and public health annotators, and used to supervised learning classifiers. Results: Lexicon development from expanded literature stem-word lists…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHIV/AIDS Impact and Responses · Food Security and Health in Diverse Populations
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Dense Connections · Weight Decay · Multi-Head Attention · Cosine Annealing · Attention Dropout · Dropout
