SDoH-GPT: Using Large Language Models to Extract Social Determinants of   Health (SDoH)

Bernardo Consoli; Xizhi Wu; Song Wang; Xinyu Zhao; Yanshan Wang,; Justin Rousseau; Tom Hartvigsen; Li Shen; Huanmei Wu; Yifan Peng; Qi Long,; Tianlong Chen; Ying Ding

arXiv:2407.17126·cs.CL·July 25, 2024

SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

Bernardo Consoli, Xizhi Wu, Song Wang, Xinyu Zhao, Yanshan Wang,, Justin Rousseau, Tom Hartvigsen, Li Shen, Huanmei Wu, Yifan Peng, Qi Long,, Tianlong Chen, Ying Ding

PDF

TL;DR

This paper introduces SDoH-GPT, a few-shot large language model approach that efficiently extracts social determinants of health from medical notes, reducing annotation effort and costs while maintaining high accuracy and consistency.

Contribution

The study presents a novel few-shot LLM method for SDoH extraction that requires minimal annotations, significantly lowering time and cost compared to traditional methods.

Findings

01

Achieved up to 0.92 Cohen's kappa with human annotators.

02

Reduced extraction time and cost by tenfold and twentyfold respectively.

03

Maintained high accuracy with over 0.90 AUROC across datasets.

Abstract

Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a simple and effective few-shot Large Language Model (LLM) method leveraging contrastive examples and concise instructions to extract SDoH without relying on extensive medical annotations or costly human intervention. It achieved tenfold and twentyfold reductions in time and cost respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of SDoH-GPT and XGBoost leverages the strengths of both, ensuring high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores. Testing across three distinct datasets has confirmed its robustness and accuracy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.