# Challenges and solutions in determining urolithiasis caseloads using the digital infrastructure of a clinical data warehouse

**Authors:** Martin Schönthaler, Noah Hempen, Maria Weymann, Maximilian Ferry von Bargen, Maximilian Glienke, Antonia Elsässer, Max Behrens, Harald Binder, Nadine Binder

PMC · DOI: 10.1371/journal.pone.0341068 · PLOS One · 2026-01-23

## TL;DR

This study compares different methods for identifying urolithiasis cases in a clinical data warehouse to ensure accurate data for research.

## Contribution

The study highlights the importance of data source understanding and variable definition in clinical data extraction for urolithiasis research.

## Key findings

- Algorithmic extraction from performance data matched the reference group in case identification.
- Manual and reimbursement data methods showed 14% and 12% deviations due to human errors and data merging.
- Results are transferable to other centers with similar clinical data warehouse structures.

## Abstract

Background: To provide more evidence in urolithiasis research, we have established the German Nationwide Register for RECurrent URolithiasis (RECUR) using local clinical data warehouses (CDWH). For RECUR and other registers relying on digitalized clinical data, it is crucial to ensure the data’s reliability for answering scientific questions. In this work, we aim to compare the results of different CDWH-based queries on urolithiasis cases next to manual case extraction from the primary source.

Methods: Sources for data extraction included the Medical Center University of Freiburg (MCUF) hospital information system (HIS), MCUF performance data (a clinical data set with merged data from patients including data from various time points throughout their treatment), and MCUF reimbursement data. We extracted data on caseloads in urolithiasis algorithmically (performance and reimbursement data) and compared those to a reference group compiled of manually extracted data from the local HIS and algorithmically extracted data.

Results: Algorithmic extraction based on performance data resulted in correct and complete case identification as compared to the reference group. The case numbers from manual extraction from HIS data and algorithmic extraction from reimbursement data differed by 14% and 12%, respectively. The reasons for deviations in HIS data included human errors and a lack of data availability from different wards. Deviations in reimbursement data arose primarily due to the merging of cases in the context of reimbursement mechanisms. As the CDWH at MCUF is part of the German Medical Informatics Initiative (MII), the results can be transferred to other medical centers with similar CDWH structure.

Conclusions: The current study provides firm evidence of the importance of clearly defining a study’s target variable, e.g., urolithiasis cases, and a thorough understanding of the data sources and modes used to extract the target data. Our work clearly shows that, depending on various data sources, a case is not a case is not a case.

## Linked entities

- **Diseases:** urolithiasis (MONDO:0024647)

## Full-text entities

- **Diseases:** hypertension (MESH:D006973), DIC (MESH:D000081042), Chest Pain (MESH:D002637), urinary tract infection (MESH:D014552), hydronephrosis (MESH:D006869), chronic kidney disease (MESH:D051436), sepsis (MESH:D018805), MII (MESH:D000071069), renal colic (MESH:D056844), RECUR (MESH:D052878), CDWH (MESH:D000075902), Pain (MESH:D010146), arterial hypertension (MESH:D000081029), stone (MESH:D007669), urinary stones (MESH:D014545)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12829838/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12829838/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12829838/full.md

---
Source: https://tomesphere.com/paper/PMC12829838