A Study of Social and Behavioral Determinants of Health in Lung Cancer Patients Using Transformers-based Natural Language Processing Models
Zehao Yu, Xi Yang, Chong Dang, Songzi Wu, Prakash Adekkanattu,, Jyotishman Pathak, Thomas J. George, William R. Hogan, Yi Guo, Jiang Bian,, Yonghui Wu

TL;DR
This study evaluates transformer-based NLP models, BERT and RoBERTa, for extracting social and behavioral health determinants from clinical narratives in lung cancer patients, highlighting the importance of unstructured data for comprehensive health insights.
Contribution
It demonstrates the effectiveness of BERT-based models in extracting SBDoH information and compares NLP results with structured EHR data, emphasizing the need for combined data sources.
Findings
BERT achieved F1-scores of 0.8791 (strict) and 0.8999 (lenient).
NLP extracted more detailed SBDoH information than structured EHRs.
Combining narratives and structured data provides a more complete patient health profile.
Abstract
Social and behavioral determinants of health (SBDoH) have important roles in shaping people's health. In clinical research studies, especially comparative effectiveness studies, failure to adjust for SBDoH factors will potentially cause confounding issues and misclassification errors in either statistical analyses and machine learning-based models. However, there are limited studies to examine SBDoH factors in clinical outcomes due to the lack of structured SBDoH information in current electronic health record (EHR) systems, while much of the SBDoH information is documented in clinical narratives. Natural language processing (NLP) is thus the key technology to extract such information from unstructured clinical text. However, there is not a mature clinical NLP system focusing on SBDoH. In this study, we examined two state-of-the-art transformer-based NLP models, including BERT and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFood Security and Health in Diverse Populations · Health Promotion and Cardiovascular Prevention · Nursing Diagnosis and Documentation
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Dropout · Softmax · Attention Dropout · Dense Connections · Layer Normalization
