# Identification of diagnostic discrepancies as a quality assurance measure in emergency medicine – a validation study

**Authors:** Thimo Marcin, Nadine Werthmüller, Fabian Kölbener, Martin Müller, Laura Zwaan, Stefanie C. Hautz, Alexander Schuster, Aristomenis K. Exadaktylos, Wolf E. Hautz

PMC · DOI: 10.1186/s13049-026-01572-x · Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine · 2026-02-11

## TL;DR

This study validates an automated method to detect diagnostic discrepancies in emergency medicine, helping identify potential diagnostic errors efficiently.

## Contribution

The study validates an automated screening tool for identifying diagnostic discrepancies in emergency medicine using ICD-10 code similarity.

## Key findings

- The automated method showed high discriminative performance with AUCs ranging from 0.94 to 0.95.
- Using the most sensitive cutoff, all true discrepancies were detected, though 15% of cases were falsely flagged.
- The method shows promise as a practical tool to prioritize cases for detailed chart review.

## Abstract

Diagnostic errors are a major care health concern but remain difficult to study because their identification often requires resource-intensive chart reviews. We aimed to validate a previously proposed automated method for detecting discrepancies between an initial and a later, more definitive diagnosis as a screening tool for potential diagnostic errors in a large, prospective cohort of emergency department (ED) patients.

This secondary analysis included 1,204 patients enrolled in the DDxBRO randomized trial, which evaluated the effect of a diagnostic decision support tool on diagnostic quality in four Swiss emergency departments. For each patient, the ED diagnosis was extracted from the ED discharge letter, and the follow-up diagnosis at 14 days was obtained from hospital discharge letters, or general practitioner notes. All diagnoses were coded using ICD-10 and manually classified for discrepancies by two blinded ED physicians according to a predefined scheme. The automated method calculated the “similarity” between ICD-10 codes for ED and follow-up diagnoses. Discriminative performance of this method to distinguish between cases with and without diagnostic error was evaluated using receiver operating characteristic (ROC) curves, and sensitivity, specificity, and predictive values were assessed across multiple cutoffs.

The automated method showed high and consistent discriminative performance across all algorithms tested, with areas under the ROC curve (AUCs) ranging from 0.94 to 0.95. Using the most sensitive cutoff in the simplest algorithm, all true discrepancies were detected, but 162 cases (15%) were incorrectly flagged as discrepant.

The automated method demonstrated high accuracy and shows promise as a practical screening tool to prioritize cases for resource-intensive chart review.

NCT05346523.

The online version contains supplementary material available at 10.1186/s13049-026-01572-x.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12998101/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12998101/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12998101/full.md

---
Source: https://tomesphere.com/paper/PMC12998101