Creating Compact Regions of Social Determinants of Health
Barrett Lattimer, Alan Lattimer

TL;DR
This paper compares multiple regionalization algorithms on large-scale social determinant of health data, evaluating their scalability, geographic metrics, and performance in real-world applications.
Contribution
It provides a comprehensive comparison of state-of-the-art regionalization methods on large datasets, including new geographic metrics and memory analysis.
Findings
Agglomerative Clustering and SKATER perform well on large data.
Regionalization methods differ significantly in scalability and geographic accuracy.
Unconstrained K-Means is less effective than constrained methods for health data segmentation.
Abstract
Regionalization is the act of breaking a dataset into contiguous homogeneous regions that are heterogeneous from each other. Many different algorithms exist for performing regionalization; however, using these algorithms on large real world data sets have only become feasible in terms of compute power in recent years. Very few studies have been done comparing different regionalization methods, and those that do lack analysis in memory, scalability, geographic metrics, and large-scale real-world applications. This study compares state-of-the-art regionalization methods, namely, Agglomerative Clustering, SKATER, REDCAP, AZP, and Max-P-Regions using real world social determinant of health (SDOH) data. The scale of real world SDOH data, up to 1 million data points in this study, not only compares the algorithms over different data sets but provides a stress test for each individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth disparities and outcomes
MethodsTest · k-Means Clustering
