Topic Modeling on Clinical Social Work Notes for Exploring Social Determinants of Health Factors
Shenghuan Sun, Travis Zack, Madhumita Sushil, Atul J. Butte

TL;DR
This study applies topic modeling to a large set of social work notes to uncover detailed social determinants of health, revealing rich, unique insights not captured in traditional medical records.
Contribution
It demonstrates that social work notes contain valuable, previously underutilized information on social determinants of health through the use of LDA topic modeling.
Findings
Identified 11 key SDoH-related topics including financial status and abuse history.
Captured variation in social work notes across different patient conditions.
Showed social work notes provide rich, unique SDoH data.
Abstract
Most research studying social determinants of health (SDoH) has focused on physician notes or structured elements of the electronic medical record (EMR). We hypothesize that clinical notes from social workers, whose role is to ameliorate social and economic factors, might provide a richer source of data on SDoH. We sought to perform topic modeling to identify robust topics of discussion within a large cohort of social work notes. We retrieved a diverse, deidentified corpus of 0.95 million clinical social work notes from 181,644 patients at the University of California, San Francisco. We used word frequency analysis and Latent Dirichlet Allocation (LDA) topic modeling analysis to characterize this corpus and identify potential topics of discussion. Word frequency analysis identified both medical and non-medical terms associated with specific ICD10 chapters. The LDA topic modeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth disparities and outcomes · Food Security and Health in Diverse Populations
MethodsLinear Discriminant Analysis
