# Application of an Externally Developed Algorithm to Identify Research Cases and Controls from EHR Data: Trials and Triumphs

**Authors:** Nelly Estefanie Garduno-Rapp, Simone Herzberg, Henry H. Ong, Cindy Kao, Christoph U. Lehmann, Srushti Gangireddy, Nitin B Jain, Ayush Giri

PMC · DOI: 10.1055/a-2524-5216 · 2025-03-26

## TL;DR

Researchers successfully applied and validated an EHR-based algorithm to identify cases and controls for a rotator cuff tear study across different medical centers.

## Contribution

The study demonstrates successful cross-center implementation of a rule-based algorithm for EHR data classification and highlights the importance of standardization.

## Key findings

- The algorithm correctly classified 80.9% of patients initially, with improved sensitivity (0.94) and specificity (0.76) after refinement.
- The process revealed 12 data entry errors in the gold standard dataset.
- Code variability between centers necessitated algorithm refinement for better performance.

## Abstract

The use of electronic health records (EHRs) in research demands robust and interoperable systems. By linking biorepositories to EHR algorithms, researchers can efficiently identify cases and controls for large observational studies (e.g., genome-wide association studies). This is critical for ensuring efficient and cost-effective research. However, the lack of standardized metadata and algorithms across different EHRs complicates their sharing and application. Our study presents an example of a successful implementation and validation process.

This study aimed to implement and validate a rule-based algorithm from a tertiary medical center in Tennessee to classify cases and controls from a research study on rotator cuff tear (RCT) nested within a tertiary medical center in North Texas and to assess the algorithm's performance.

We applied a phenotypic algorithm (designed and validated in a tertiary medical center in Tennessee) using EHR data from 492 patients enrolled in a case-control study recruited from a tertiary medical center in North Texas. The algorithm leveraged the international classification of diseases and current procedural terminology codes to identify case and control status for degenerative RCT. A manual review was conducted to compare the algorithm's classification with a previously recorded gold standard documented by clinical researchers.

Initially the algorithm identified 398 (80.9%) patients correctly as cases or controls. After fine-tuning and correcting errors in our gold standard dataset, we calculated a sensitivity of 0.94 and a specificity of 0.76. The implementation of the algorithm presented challenges due to the variability in coding practices between medical centers. To enhance performance, we refined the algorithm's data dictionary by incorporating additional codes. The process highlighted the need for meticulous code verification and standardization in multi-center studies.

Sharing case-control algorithms boosts EHR research. Our rule-based algorithm improved multi-site patient identification and revealed 12 data entry errors, helping validate our results.

## Full-text entities

- **Diseases:** rotator cuff tear (MESH:D000070636), International (MESH:D000082122), Diseases (MESH:D004194)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11945218/full.md

---
Source: https://tomesphere.com/paper/PMC11945218