# Evaluating the generalizability of commercial healthcare claims data

**Authors:** Alex Dahlen, Yaowei Deng, Vivek Charu

PMC · DOI: 10.1093/aje/kwaf142 · American Journal of Epidemiology · 2025-06-30

## TL;DR

This paper shows that commercial healthcare claims data significantly underestimate inpatient discharge rates compared to reference data, with socioeconomic factors strongly influencing the bias.

## Contribution

The study quantifies external validity bias in commercial healthcare claims data and links it to socioeconomic and demographic factors.

## Key findings

- Commercial claims data underestimate inpatient discharges by 23.1% for all Americans.
- About 25% of procedures have rates underestimated by a factor of 2 in claims data.
- Socioeconomic factors explain 69.4% of the bias variation (P < .001).

## Abstract

Commercial healthcare claims datasets area nonrandom sample of the US population, affecting generalizability. Rigorous comparisons of claims-derived results to ground-truth data that quantify external validity bias are lacking. Our goal is to (1) quantify external validity of commercial healthcare claims data and (2) to evaluate how socioeconomic/demographic factors are related to the bias. We analyzed inpatient discharge records occurring between January 1, 2019 and December 31, 2019 in five states: California, Iowa, Maryland, Massachusetts, and New Jersey, and compared rates (per person-year) of the 250 most common inpatient procedures between claims and reference data for each target population. We used Merative MarketScan Commercial Database for the claims data and State Inpatient Databases and the US Census as reference. For a target population of all Americans, commercial healthcare claims underestimate the rate of overall inpatient discharges by 23.1%. The extent of bias varied across procedures, with the rates of ~25% of procedures being underestimated by a factor of 2. Socioeconomic factors were significantly associated with the magnitude of bias (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
${R}^2=69.4\%,$\end{document}P < .001). When the target population was restricted to commercially insured Americans, the bias decreased substantially (1.4% of procedures were biased by more than factor of 2), but some variation across procedures remained.

## Full-text entities

- **Diseases:** ACS (MESH:D003147), SID (MESH:D013398), varicella (MESH:D002644), SIDs (MESH:D018458), NDI (MESH:D012892), influenza (MESH:D007251), hip replacement (MESH:D025981), mumps (MESH:D009107), measles (MESH:D008457), Cancer (MESH:D009369), infectious diseases (MESH:D003141)
- **Chemicals:** oxygen (MESH:D010100)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12527232/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12527232/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12527232/full.md

---
Source: https://tomesphere.com/paper/PMC12527232