# Equivalence of Type 2 Diabetes Prevalence Estimates: Comparative Study of Similar Phenotyping Algorithms Using Electronic Health Record Data

**Authors:** Muchiri E Wandai, Katie S Allen, Ashley Wiensch, John Price, Brian E Dixon

PMC · DOI: 10.2196/79653 · JMIR Public Health and Surveillance · 2025-10-27

## TL;DR

This study compared two methods for estimating type 2 diabetes prevalence using electronic health records and found them mostly equivalent, except for Hispanic individuals.

## Contribution

The study demonstrates statistical equivalence of two phenotyping algorithms for T2D prevalence estimation in EHR data, with implications for harmonizing public health surveillance methods.

## Key findings

- Overall T2D prevalence estimates were 4.1% for CP 1 and 2.4% for CP 2.
- The difference in prevalence estimates was statistically insignificant except for Hispanic individuals.
- CP 1 included more data types (diagnoses, lab, medication) compared to CP 2, which used only diagnostic codes.

## Abstract

Timely surveillance of diabetes mellitus remains a challenge for public health agencies. In this study, researchers compared type 2 diabetes (T2D) prevalence estimates using electronic health record (EHR) data and computable phenotypes (CPs) as defined and applied by 2 independent networks. One network, Diabetes in Children, Adolescents, and Young Adults, was a research consortium, and the other, the Multi-State EHR-Based Network for Disease Surveillance, is a practice-based public health surveillance network.

This study sought to determine the equivalence of T2D prevalence estimates generated by 2 distinct, yet conceptually related, CPs using EHR data.

Each network used diagnostic, laboratory, and medication data for young adults (aged 18-44 years) extracted from the Indiana Network for Patient Care (INPC) to independently calculate prevalence of T2D using distinct CPs for the year 2022. The INPC is a statewide health information exchange that receives EHR data from multiple health care systems and supports public health use cases such as surveillance. The two one-sided tests method for independence with a predefined margin of –2.5 to +2.5 percentage points was used to compare the estimated prevalence as previously derived from the Multi-State EHR-Based Network for Disease Surveillance and Diabetes in Children, Adolescents, and Young Adults networks. The two one-sided tests for equivalence show that any observed difference between 2 estimates is small and practically insignificant. Results at the overall level, and stratified by sex, age, and race or ethnicity, were examined.

Overall prevalence estimates for 2022 were 4.1% for CP 1 and 2.4% for CP 2. Although prevalence estimates for CP 1 were consistently higher than those for CP 2, absolute differences were generally less than 2.5 percentage points, which did not result in a statistically significant (P<.001) difference between estimates. The only exception was for Hispanic individuals, where prevalence was significantly different (P=0.2) for CP 1 (5.4%) versus CP 2 (3.0%), yielding a margin of 2.4 (95% CI 2.2-2.6) percentage points. Other groups that had relatively higher but statistically nonsignificant prevalence included male individuals (4.6% for CP 1 vs 2.3% for CP 2), individuals aged 35-44 years (6.9% for CP 1 vs 4.9% for CP 2), and African American individuals (5.5% for CP 1 vs 3.7% for CP 2). Therefore, we concluded that the 2 CPs largely produced equivalent estimates of T2D prevalence.

The 2 independent CPs demonstrated equivalent T2D prevalence estimates, except in Hispanic individuals. Although the CPs can be considered statistically equivalent, the data driving each CP may impact accuracy and completeness. CP 1 was broader, incorporating clinical diagnoses, laboratory data, and medication, whereas CP 2 used clinical diagnostic codes alone. These results have implications for improving harmonization of CPs for public health surveillance.

## Linked entities

- **Diseases:** type 2 diabetes (MONDO:0005148), diabetes mellitus (MONDO:0005015)

## Full-text entities

- **Diseases:** CP (MESH:D002972), T2D (MESH:D003924), Diabetes (MESH:D003920)
- **Chemicals:** CP (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12571427/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12571427/full.md

---
Source: https://tomesphere.com/paper/PMC12571427