# Performance of predictive AI-based clinical decision support systems across clinical domains: A systematic review and meta-analysis

**Authors:** William J. Waldock, Ahmad Guni, Ara Darzi, Hutan Ashrafian

PMC · DOI: 10.1371/journal.pdig.0001310 · PLOS Digital Health · 2026-03-24

## TL;DR

This study reviews how well AI tools help doctors make decisions across many medical fields, finding moderate accuracy but highlighting the need for better real-world testing.

## Contribution

The study introduces the ROADMAP framework to bridge the gap between AI performance and real-world clinical integration.

## Key findings

- AI-based CDSS showed moderate discriminatory ability (AUC: 0.652) and high specificity (0.819) across 17 medical specialties.
- Most studies (76%) were retrospective, and only 24% involved prospective deployment, highlighting a gap in real-world validation.
- The ROADMAP framework is proposed to guide the development and evaluation of AI tools for clinical integration.

## Abstract

Despite advances in deep learning and transformer architectures, prior reviews have focused narrowly on traditional clinical decision support systems (CDSS) or single medical domains, leaving significant gaps in understanding contemporary AI-driven predictive tools. This systematic review and meta-analysis evaluated the predictive performance of artificial intelligence-based CDSS (AI-CDSS) across multiple medical specialties. Following PRISMA guidelines, PubMed and Cochrane Library were searched through December 2024 for studies evaluating predictive AI-CDSS using real-world clinical data. Two reviewers independently screened 3,296 records (κ = 0.833), with study quality assessed via QUADAS-2 and performance measures pooled using random-effects meta-analysis. Fifty studies spanning 17 medical specialties were included. Meta-analysis demonstrated moderate discriminatory ability (pooled AUC: 0.652, 95% CI: 0.562–0.743), high specificity (0.819, 95% CI: 0.793–0.844), moderate accuracy (0.765, 95% CI: 0.734–0.796), and variable sensitivity (0.660, 95% CI: 0.535–0.785), with substantial heterogeneity across all measures (I² ≥ 98.9%). Only 24% of studies involved prospective deployment, and 64% reported exclusively technical metrics without clinical workflow data. Predictive AI-CDSS demonstrate moderate-to-good diagnostic performance with strong specificity; however, the predominance of retrospective study designs and limited implementation reporting reveal critical gaps between technical validation and real-world clinical utility. To address these shortcomings, we propose the ROADMAP framework, structured around seven domains: Representative development, Outcomes-focused evaluation, Assessment for deployment, Data harmonization, Monitoring for bias, Allocation via economic evaluations, and Priorities for standardized reporting and prospective validation. This framework provides a practical roadmap for bridging the gap between algorithmic performance and meaningful clinical integration.

In our study, we set out to understand how well modern Artificial Intelligence (AI) assists doctors in making clinical decisions across a wide range of medical specialties. While AI technology has advanced rapidly, we realized that previous research was often too narrow or outdated to show the full picture of these modern predictive tools.

After reviewing 50 studies covering 17 different medical fields, we found that current AI tools demonstrate moderate to good accuracy. They are particularly effective at correctly identifying when a patient does not have a condition (high specificity). However, they are less consistent at catching every positive case, and their performance varies significantly depending on the setting.

Crucially, we identified a major gap between technical success and real-world usefulness. Most studies tested AI on historical data rather than in live hospital environments, often ignoring how these tools fit into a doctor’s actual workflow.

To address this, we developed the ROADMAP framework when applying our findings to the case study to antimicrobial resistance. This seven-step guide outlines how researchers can move beyond simple math scores to create AI tools that are representative, fair, economically viable, and proven to work in actual patient care scenarios.

## Full-text entities

- **Genes:** ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}
- **Diseases:** Infection (MESH:D007239), AMR (MESH:C565965), AI (MESH:C538142), cirrhosis (MESH:D005355), CDSS (MESH:D020195), sepsis (MESH:D018805), Infectious Diseases (MESH:D003141), nosocomial infections (MESH:D003428)
- **Species:** Enterobacterales (order) [taxon 91347], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13012507/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13012507/full.md

## References

146 references — full list in the complete paper: https://tomesphere.com/paper/PMC13012507/full.md

---
Source: https://tomesphere.com/paper/PMC13012507