SynthEHR-Eviction: Enhancing Eviction SDoH Detection with LLM-Augmented Synthetic EHR Data
Zonghai Yao, Youxia Zhao, Avijit Mitra, David A. Levy, Emily Druhl, Jack Tsai, Hong Yu

TL;DR
This paper presents SynthEHR-Eviction, a scalable pipeline that uses large language models and human-in-the-loop methods to extract eviction-related social determinants of health from clinical notes, creating a large annotated dataset and improving detection accuracy.
Contribution
The paper introduces a novel pipeline combining LLMs, human annotation, and prompt optimization for eviction SDoH detection, resulting in the largest public eviction dataset and improved model performance.
Findings
Fine-tuned LLMs achieved high Macro-F1 scores of 88.8% and 90.3%.
The pipeline reduces annotation effort by over 80%.
Models outperform existing baselines like GPT-4o-APO and BioBERT.
Abstract
Eviction is a significant yet understudied social determinants of health (SDoH), linked to housing instability, unemployment, and mental health. While eviction appears in unstructured electronic health records (EHRs), it is rarely coded in structured fields, limiting downstream applications. We introduce SynthEHR-Eviction, a scalable pipeline combining LLMs, human-in-the-loop annotation, and automated prompt optimization (APO) to extract eviction statuses from clinical notes. Using this pipeline, we created the largest public eviction-related SDoH dataset to date, comprising 14 fine-grained categories. Fine-tuned LLMs (e.g., Qwen2.5, LLaMA3) trained on SynthEHR-Eviction achieved Macro-F1 scores of 88.8% (eviction) and 90.3% (other SDoH) on human validated data, outperforming GPT-4o-APO (87.8%, 87.3%), GPT-4o-mini-APO (69.1%, 78.1%), and BioBERT (60.7%, 68.3%), while enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFood Security and Health in Diverse Populations · Machine Learning in Healthcare · Homelessness and Social Issues
