Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods
Yujuan Fu, Giridhar Kaushik Ramachandran, Nicholas J Dobbins, Namu, Park, Michael Leu, Abby R. Rosenberg, Kevin Lybarger, Fei Xia, Ozlem Uzuner,, Meliha Yetisgen

TL;DR
This paper introduces PedSHAC, a new annotated corpus of pediatric social history notes, and evaluates large language models for extracting social determinants of health, demonstrating high accuracy with fine-tuning and in-context learning.
Contribution
It presents a novel annotated dataset and compares fine-tuned and in-context LLM methods for SDoH extraction in pediatric clinical notes.
Findings
Fine-tuned LLMs achieve 78.4 F1 on event arguments.
GPT-4 with in-context learning achieves 82.3 F1 on event triggers.
PedSHAC covers ten social health determinants with high annotation agreement.
Abstract
Social determinants of health (SDoH) play a critical role in shaping health outcomes, particularly in pediatric populations where interventions can have long-term implications. SDoH are frequently studied in the Electronic Health Record (EHR), which provides a rich repository for diverse patient data. In this work, we present a novel annotated corpus, the Pediatric Social History Annotation Corpus (PedSHAC), and evaluate the automatic extraction of detailed SDoH representations using fine-tuned and in-context learning methods with Large Language Models (LLMs). PedSHAC comprises annotated social history sections from 1,260 clinical notes obtained from pediatric patients within the University of Washington (UW) hospital system. Employing an event-based annotation scheme, PedSHAC captures ten distinct health determinants to encompass living and economic stability, prior trauma, education…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFood Security and Health in Diverse Populations
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Dense Connections · Label Smoothing
