SODA: A Natural Language Processing Package to Extract Social Determinants of Health for Cancer Studies
Zehao Yu, Xi Yang, Chong Dang, Prakash Adekkanattu, Braja Gopal Patra,, Yifan Peng, Jyotishman Pathak, Debbie L. Wilson, Ching-Yuan Chang, Wei-Hsuan, Lo-Ciganic, Thomas J. George, William R. Hogan, Yi Guo, Jiang Bian, Yonghui, Wu

TL;DR
This paper introduces SODA, an open-source NLP package utilizing transformer models to extract social determinants of health from cancer patient records, demonstrating good accuracy and generalizability to opioid use data.
Contribution
The study develops a novel NLP package with pre-trained transformers for SDoH extraction, including a new annotated corpus and strategies for improving model performance across domains.
Findings
BERT achieved high F1 scores in SDoH extraction.
Fine-tuning improved model performance on opioid use data.
10 SDoH categories had >70% extraction rate.
Abstract
Objective: We aim to develop an open-source natural language processing (NLP) package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients, examine the generalizability of SODA to a new disease domain (i.e., opioid use), and evaluate the extraction rate of SDoH using cancer populations. Methods: We identified SDoH categories and attributes and developed an SDoH corpus using clinical notes from a general cancer cohort. We compared four transformer-based NLP models to extract SDoH, examined the generalizability of NLP models to a cohort of patients prescribed with opioids, and explored customization strategies to improve performance. We applied the best NLP model to extract 19 categories of SDoH from the breast (n=7,971), lung (n=11,804), and colorectal cancer (n=6,240) cohorts. Results and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFood Security and Health in Diverse Populations · Health Literacy and Information Accessibility
