NHANES-GCP: Leveraging the Google Cloud Platform and BigQuery ML for reproducible machine learning with data from the National Health and Nutrition Examination Survey
B. Ross Katz, Abdul Khan, James York-Winegar, and Alexander J. Titus

TL;DR
This paper introduces NHANES-GCP, a cost-effective, automated, and reproducible cloud-based platform on Google Cloud for managing NHANES data and performing machine learning analyses using BigQuery ML.
Contribution
It presents a novel infrastructure-as-code solution that automates NHANES data management and enables integrated machine learning workflows on GCP.
Findings
Cost less than $2 to run and $15/year for hosting data
Automates data engineering and cleaning processes
Supports end-to-end machine learning workflows within SQL-like queries
Abstract
Summary: NHANES, the National Health and Nutrition Examination Survey, is a program of studies led by the Centers for Disease Control and Prevention (CDC) designed to assess the health and nutritional status of adults and children in the United States (U.S.). NHANES data is frequently used by biostatisticians and clinical scientists to study health trends across the U.S., but every analysis requires extensive data management and cleaning before use and this repetitive data engineering collectively costs valuable research time and decreases the reproducibility of analyses. Here, we introduce NHANES-GCP, a Cloud Development Kit for Terraform (CDKTF) Infrastructure-as-Code (IaC) and Data Build Tool (dbt) resources built on the Google Cloud Platform (GCP) that automates the data engineering and management aspects of working with NHANES data. With current GCP pricing, NHANES-GCP costs less…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNutritional Studies and Diet · Health, Environment, Cognitive Aging
