Using Synthetic Data for Machine Learning-based Childhood Vaccination Prediction in Narok, Kenya
Jimmy Bach, Yang Li, Yaqi Liu, John Sankok, Rose Kimani, Carrie B. Dolan, Julius N. Odhiambo, Haipeng Chen

TL;DR
This study demonstrates that machine learning models can accurately predict children at risk of missing vaccines in Kenya, and synthetic data can be used to preserve privacy without sacrificing predictive accuracy.
Contribution
The paper introduces a novel use of tabular diffusion-based synthetic data to enhance privacy in machine learning models for health informatics in low-resource settings.
Findings
Models achieved over 90% recall, precision, and F1-score in predicting vaccination risk.
Synthetic data training maintained model performance while protecting individual privacy.
Abstract
Background: Limited data utilization in low-resource settings poses a barrier to the vaccine delivery ecosystem, undermining efforts to achieve equitable immunization coverage. In nomadic populations, individuals face an increased risk of missing crucial vaccination doses as children. One such population is the Maasai in Narok County, Kenya, where the absence of high-volume, quality data hampers accurate coverage estimates, impedes efficient resource allocation, and weakens the ability to deliver timely interventions. Additionally, data privacy concerns are heightened in groups with limited sensitive data. Objectives: First, we aim to identify children at risk of missing key vaccines across a large population to provide timely, evidence-based interventions that support increased vaccination coverage. Second, we aim to better protect the privacy of sensitive health data in a vulnerable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
