# Appropriate definition of childbirth using Japanese administrative database: a cross-sectional cohort validation study

**Authors:** Miyuki Koizumi, Hiroki Nakajima, Yuichi Nishioka, Emiri Morita, Tomoya Myojin, Tatsuya Noda, Tomoaki Imamura, Yutaka Takahashi

PMC · DOI: 10.1186/s12884-026-08797-9 · 2026-02-11

## TL;DR

This study develops and validates algorithms to accurately identify childbirth events using claims data in Japan, where direct mother-child linkage is not available.

## Contribution

The study introduces and validates a claims-based algorithm for identifying childbirth in the absence of direct mother-child linkage.

## Key findings

- The algorithm [A+susp] or [B] or [C] achieved high specificity (98.9%) and moderate sensitivity (66.9%) for identifying childbirth.
- For second childbirth, the same algorithm showed a Youden Index of 0.57 when the 11-month difference was considered.
- The validated algorithm can improve the accuracy of childbirth-related research using claims databases.

## Abstract

Claims data analyses are useful in clinical research. However, evidence on the validity of claims-based algorithms for identifying childbirth remains limited, particularly in settings where mother–child linkage is unavailable. Therefore, we aimed to develop and validate algorithms to identify childbirth from a claims database.

The DeSC database, including claims data for approximately 13 million people, was accessed. Eighteen algorithms were designed using combinations of diagnosis-related codes with/without a suspected flag regarding childbirth ([A+susp]/[A]), medical procedure codes [B], and medication codes [C]. We used the parent–child identifier (ID) in the DeSC database as the gold standard, which is assigned based on family relationship information recorded in the insurer-managed registry of insured persons. Parent–child ID links children to an insured parent within the same insurance unit, enabling mother–child linkage. The gold standard for the month and year of childbirth was defined as the child’s month and year of birth among women aged 15–49 years linked by parent–child IDs during the observation period. We calculated sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), Kappa Index, and Youden Index for each algorithm. To validate algorithms for estimating second childbirth during the observation period, which would become useful in defining childbirth, identification of second childbirth began 2–24 months after the first, given that the average age difference was two years.

A total of 854,626 women were included in this study, of whom 37,934 were aged 15–49 years at the time of parent–child ID assignment and classified as experiencing childbirth during the observation period. The algorithm with the highest value was “[A+susp] or [B] or [C]” (sensitivity: 66.9%, specificity: 98.9%, PPV: 73.7%, NPV: 98.5%, Kappa Index: 0.69, and Youden Index: 0.66). With respect to second childbirth, algorithm “[A+susp] or [B] or [C]” showed that the 11-month difference had the highest Youden Index at 0.57.

We developed algorithms based on claims data and established an optimal algorithm for estimating childbirth. This validated algorithm can be used for accurate estimation of childbirth to clarify pregnancy- and childbirth-related diseases in future claims database studies.

The online version contains supplementary material available at 10.1186/s12884-026-08797-9.

## Full-text entities

- **Diseases:** pertussis (MESH:D014917), diabetes (MESH:D003920), coronary artery disease (MESH:D003324), abortion (MESH:D000026), placental abruption (MESH:D000037), miscarriage (MESH:D000022), preterm birth (MESH:D047928), Diseases (MESH:D004194), depression (MESH:D003866), death (MESH:D003643), stillbirths (MESH:D050497), osteoporosis (MESH:D010024)
- **Chemicals:** oxytocin (MESH:D010121), dinoprostone (MESH:D015232)
- **Species:** Homo sapiens (human, species) [taxon 9606]

---
Source: https://tomesphere.com/paper/PMC12998341