# Are Machine Learning methods effective in detecting undiagnosed atrial fibrillation in primary care settings using electronic health records? A systematic review

**Authors:** Mhd Diaa Chalati, Chetan Shirvankar, Genevieve Gore, Abhinav Sharma, Samira Abbasgholizadeh-Rahimi

PMC · DOI: 10.1371/journal.pdig.0001009 · PLOS Digital Health · 2025-10-14

## TL;DR

This review evaluates how well machine learning models using electronic health records can detect undiagnosed atrial fibrillation in primary care, finding promise but also significant limitations.

## Contribution

The study is the first systematic review to assess the effectiveness of EHR-based ML models for AF detection in primary care settings.

## Key findings

- EHR-based ML models show potential for detecting undiagnosed AF, with AUROC ranging from 0.71 to 0.948.
- Only 25% of studies underwent external validation, and 53% were at high risk of bias.
- Combining ML with clinical tools improved discrimination compared to ML models alone.

## Abstract

Atrial fibrillation (AF) increases the risk of stroke, heart failure and mortality. Current screening guidelines fail to detect AF effectively, and existing models have limited applicability in primary care. Electronic health records (EHRs) provide an opportunity to apply machine learning (ML) for automated AF detection; however, their performance relative to standard care remains unclear. We conducted a systematic review to evaluate the effectiveness, quality, and applicability of EHR-based ML models for detecting AF in primary care. The review is informed by Joanna Briggs Institute and Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines. We searched seven databases from inception to May 2023. Eligible studies involved adults in primary care where ML models using EHRs were compared to standard care. The primary outcome was the detection of undiagnosed AF; secondary outcomes examined impacts on patients, healthcare providers, and systems. Data were extracted using CHARMS, risk of bias and applicability were evaluated through PROBAST and MI-CLAIM checklists. This review was registered in International Prospective Register of Systematic Reviews (CRD42023390603). From 4,536 references screened, 16 studies were included. Among these, 14 (87%) were retrospective cohort studies, one (6%) was prospective, and one (6%) was a randomized controlled trial. Random forest classifiers were the most common ML model (7 studies, 43%). Only 4 studies (25%) underwent external validation, and 8 (53%) were at high risk of bias. Model discrimination (AUROC) ranged from 0.71 to 0.948, with 8 (50%) outperforming controls. Combining ML with clinical tools (3 studies, 19%) significantly improved discrimination compared to ML models alone. Reviewed models identified gout as a nontraditional predictor of AF and demonstrated that dynamic measures of BMI, blood pressure, and heart failure diagnosis were stronger predictors than static measures. EHR-based ML models show promise for improving AF detection in primary care compared to standard care. Their clinical applicability, however, is limited by insufficient external validation, high risk of bias, and variable performance. Future research should prioritize external validation, evaluation in clinical trials and the integration of predictors routinely available in primary care.

Atrial fibrillation is a common condition that significantly increases the risk of serious health problems such as strokes and heart failure. Despite its impact, it often goes undiagnosed in its early stages due to the lack of reliable clinical tools, leading to thousands of preventable hospitalizations each year. While machine learning has shown potential in improving detection, much of the research has focused on models using electrocardiogram data. In contrast, our review emphasizes the use of electronic health records, a widely available yet underutilized resource in primary care, for automated risk assessment. In this review, we examined how machine learning models based on electronic health records could improve the detection of undiagnosed atrial fibrillation. By using routinely collected health information, we showed that these tools could identify patients at risk earlier and more accurately, enabling timely interventions that improve health outcomes. However, their widespread use is limited by challenges such as inconsistent performance, insufficient testing in diverse real-world settings, and biases in data. Addressing these limitations is crucial to realizing the full potential of this approach. Our review advances the field by synthesizing evidence, identifying critical gaps, and providing a roadmap for future research. By emphasizing robust testing, active collaboration with healthcare providers and patients, and tailoring these tools to primary care needs, our work lays the foundation for making machine learning a trusted and practical solution for early detection of atrial fibrillation.

## Linked entities

- **Diseases:** atrial fibrillation (MONDO:0004981), gout (MONDO:0005393), heart failure (MONDO:0005252)

## Full-text entities

- **Diseases:** heart failure (MESH:D006333), gout (MESH:D006073), stroke (MESH:D020521), AF (MESH:D001281)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12520348/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12520348/full.md

## References

54 references — full list in the complete paper: https://tomesphere.com/paper/PMC12520348/full.md

---
Source: https://tomesphere.com/paper/PMC12520348