# Importance of Diagnostic Accuracy in Big Data: False-Positive Diagnoses of Type 2 Diabetes in Health Insurance Claims Data of 70 Million Germans

**Authors:** Ralph Brinks, Thaddäus Tönnies, Annika Hoyer

PMC · DOI: 10.3389/fepid.2022.887335 · Frontiers in Epidemiology · 2022-05-23

## TL;DR

This study estimates the number of false-positive diabetes diagnoses in health insurance data from 70 million Germans, showing higher rates in women and varying with age.

## Contribution

A novel method to estimate false-positive diagnoses in large health insurance datasets using the illness-death model.

## Key findings

- False-positive diabetes diagnoses increase with age in men but peak and then drop in women.
- There are approximately 217,000 false-positive diabetes diagnoses in the dataset, mostly in women.
- The study suggests incorporating age- and sex-specific error terms to improve data accuracy.

## Abstract

Large data sets comprising diagnoses of chronic conditions are becoming increasingly available for research purposes. In Germany, it is planned that aggregated claims data – including medical diagnoses from the statutory health insurance – with roughly 70 million insurants will be published regularly. The validity of the diagnoses in such big datasets can hardly be assessed. In case the dataset comprises prevalence, incidence, and mortality, it is possible to estimate the proportion of false-positive diagnoses using mathematical relations from the illness-death model. We apply the method to age-specific aggregated claims data from 70 million Germans about type 2 diabetes in Germany stratified by sex and report the findings in terms of the age-specific ratio of false-positive diagnoses of type 2 diabetes (FPR) in the dataset. The FPR for men and women changes with age. In men, the FPR increases linearly from 1 to 3 per 1,000 in the age group of 30–50 years. For age between 50 and 80 years, FPR remains below 4 per 1,000. After 80 years of age, we have an increase to approximately 5 per 1,000. In women, we find a steep increase from age 30 to 60 years, the peak FPR is reached at approximately 12 per 1,000 between 60 and 70 years of age. After age 70 years, the FPR of women drops tremendously. In all age groups, the FPR is higher in women than in men. In terms of absolute numbers, we find that there are 217,000 people with a false-positive diagnosis in the dataset (95% confidence interval, CI: 204–229), the vast majority being women (172,000, 95% CI: 162–180). Our work indicates that possible false-positive (and negative) diagnoses should appropriately be dealt with in claims data, for example, by the inclusion of age- and sex-specific error terms in statistical models, to avoid potentially biased or wrong conclusions.

## Linked entities

- **Diseases:** type 2 diabetes (MONDO:0005148)

## Full-text entities

- **Diseases:** death (MESH:D003643), Type 2 Diabetes (MESH:D003924)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10911003/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10911003/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/PMC10911003/full.md

---
Source: https://tomesphere.com/paper/PMC10911003