How Your Location Relates to Health: Variable Importance and Interpretable Machine Learning for Environmental and Sociodemographic Data
Ishaan Maitra, Raymond Lin, Eric Chen, Jon Donnelly, Sanja, \v{S}\'cepanovi\'c, Cynthia Rudin

TL;DR
This paper introduces an interpretable machine learning framework using GAMs and MGWR to analyze environmental and sociodemographic factors affecting health outcomes across locations, revealing key predictors and regional variations.
Contribution
It develops a novel interpretable modeling approach leveraging new spatial-temporal data to identify and analyze health-related predictors across different regions and outcomes.
Findings
NO2 identified as a global predictor for multiple health conditions
Regional variations in predictors like air pollution and solar radiation
COVID-19 influenced local predictor importance
Abstract
Health outcomes depend on complex environmental and sociodemographic factors whose effects change over location and time. Only recently has fine-grained spatial and temporal data become available to study these effects, namely the MEDSAT dataset of English health, environmental, and sociodemographic information. Leveraging this new resource, we use a variety of variable importance techniques to robustly identify the most informative predictors across multiple health outcomes. We then develop an interpretable machine learning framework based on Generalized Additive Models (GAMs) and Multiscale Geographically Weighted Regression (MGWR) to analyze both local and global spatial dependencies of each variable on various health outcomes. Our findings identify NO2 as a global predictor for asthma, hypertension, and anxiety, alongside other outcome-specific predictors related to occupation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHealth, Environment, Cognitive Aging · Data-Driven Disease Surveillance · Health disparities and outcomes
