Application and Validation of Geospatial Foundation Model Data for the Prediction of Health Facility Programmatic Outputs -- A Case Study in Malawi

Lynn Metz; Rachel Haggard; Michael Moszczynski; Samer Asbah; Chris Mwase; Patricia Khomani; Tyler Smith; Hannah Cooper; Annie Mwale; Arbaaz Muslim; Gautam Prasad; Mimi Sun; Tomer Shekel; Joydeep Paul; Anna Carter; Shravya Shetty; Dylan Green

arXiv:2510.25954·cs.LG·October 31, 2025

Application and Validation of Geospatial Foundation Model Data for the Prediction of Health Facility Programmatic Outputs -- A Case Study in Malawi

Lynn Metz, Rachel Haggard, Michael Moszczynski, Samer Asbah, Chris Mwase, Patricia Khomani, Tyler Smith, Hannah Cooper, Annie Mwale, Arbaaz Muslim, Gautam Prasad, Mimi Sun, Tomer Shekel, Joydeep Paul, Anna Carter, Shravya Shetty, Dylan Green

PDF

TL;DR

This study evaluates the use of geospatial foundation model embeddings to improve prediction of health program outputs in Malawi, showing modest but consistent gains over traditional methods and highlighting the potential of multi-source GeoFM integration.

Contribution

It introduces the application of multiple GeoFM embeddings for health prediction in LMICs and demonstrates their combined utility over traditional geospatial interpolation methods.

Findings

01

GeoFM embeddings improved predictions in 87% of indicators.

02

Multi-GeoFM model achieved higher R2 scores for key health metrics.

03

Prediction accuracy was limited for targets with low primary data availability.

Abstract

The reliability of routine health data in low and middle-income countries (LMICs) is often constrained by reporting delays and incomplete coverage, necessitating the exploration of novel data sources and analytics. Geospatial Foundation Models (GeoFMs) offer a promising avenue by synthesizing diverse spatial, temporal, and behavioral data into mathematical embeddings that can be efficiently used for downstream prediction tasks. This study evaluated the predictive performance of three GeoFM embedding sources - Google Population Dynamics Foundation Model (PDFM), Google AlphaEarth (derived from satellite imagery), and mobile phone call detail records (CDR) - for modeling 15 routine health programmatic outputs in Malawi, and compared their utility to traditional geospatial interpolation methods. We used XGBoost models on data from 552 health catchment areas (January 2021-May 2023),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.