# Linking deliveries to newborns using nationwide Medicaid data

**Authors:** Lilla Orr, Basil Seif, Sun Jeon, Elisa Cascardi, Sakshina Bhatt, Jonas Swartz, Maria Isabel Rodriguez, Lee Sanders, Fernando Mendoza, Jens Hainmueller

PMC · DOI: 10.1186/s12874-025-02688-x · 2025-10-24

## TL;DR

This paper introduces a new algorithm to link mothers and newborns in Medicaid data without using personal identifiers, enabling large-scale health research.

## Contribution

A scalable algorithm for linking mother-infant pairs in Medicaid data across states and time, without relying on names or addresses.

## Key findings

- The algorithm successfully linked 11.68 million mother-infant pairs, covering 68% of Medicaid-enrolled infants.
- The linked cohort is representative of Medicaid beneficiaries on key demographics like race, age, and region.

## Abstract

Linking mothers to their newborns in health records is crucial for understanding the impact of policies, programs, and medical treatments on intergenerational health outcomes. While previous studies have used shared identifiers for linkage, such data are often unavailable in Medicaid records due to privacy concerns. Existing algorithms are not sufficiently flexible to accommodate Medicaid data from all states and from both Medicaid Analytic Extract (MAX) and Transformed Analytical Files (TAF) data systems.

We present a scalable framework and linking algorithm that connects deliveries and infants without relying on names, addresses, or linkage to vital records. First, we confirm our ability to identify newborn beneficiaries and deliveries resulting in live birth across states and over time by comparing our findings to the total number of Medicaid births recorded in the National Vital Statistics System (NVSS). Second, we confirm that our algorithm accommodates variations in Medicaid records over time and across states for MAX and TAF data, supporting matches at different levels of stringency. Finally, we assess the extent to which our algorithm is effective across demographic groups.

Using data from all 50 states over 9 years, our algorithm linked 11.68 million mother-infant dyads, covering 68% of Medicaid-enrolled infants, over 30% of all U.S. infants. Our linked cohort is approximately representative of the broader population of Medicaid beneficiaries on key observable characteristics including race and ethnicity, age, gender, and region. However, linked beneficiaries are more likely to be white and from the Midwest or Northeast relative to those we are unable to link.

Despite substantial variation in the nature of Medicaid data across states and over time, it is possible to identify family units in all states between 2011 and 2019 without linking claims to vital records. This algorithm will facilitate research on social determinants of health and the intergenerational effects of medical interventions and public policy.

The online version contains supplementary material available at 10.1186/s12874-025-02688-x.

## Full-text entities

- **Diseases:** ID (MESH:C537985), TAF (MESH:D002472)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12553173/full.md

---
Source: https://tomesphere.com/paper/PMC12553173