Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text
Avijit Mitra, Zhichao Yang, Emily Druhl, Raelene Goodwin, Hong Yu

TL;DR
Synth-SBDH is a synthetic, richly annotated dataset designed to improve the extraction of social and behavioral health factors from clinical texts, demonstrating high utility and cost-effectiveness across multiple tasks.
Contribution
This paper introduces Synth-SBDH, a comprehensive synthetic dataset with detailed annotations for SBDH, addressing limitations of existing datasets and enhancing model training and generalization.
Findings
Models trained on Synth-SBDH outperform others by up to 63.75% macro-F
Synth-SBDH is effective for rare and under-resourced SBDH categories
Human evaluation shows 71.06% alignment with LLM assessments
Abstract
Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 63.75% macro-F improvements. Additionally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMental Health via Writing
