# Artificial intelligence agents in healthcare research: A scoping review

**Authors:** Basile Njei, Yazan A. Al-Ajlouni, Ulrick Sidney Kanmounye, Sarpong Boateng, Guy Loic Nguefang, Nelvis Njei, Shadi Hamouri, Ahmad F. Al-Ajlouni

PMC · DOI: 10.1371/journal.pone.0342182 · PLOS One · 2026-02-10

## TL;DR

This paper reviews how AI agents are being used in healthcare, finding that most research is still in early stages and lacks real-world testing.

## Contribution

The study provides a comprehensive scoping review of AI agent research in healthcare, highlighting gaps in clinical validation and ethical governance.

## Key findings

- Most AI agent studies in healthcare focus on simulated environments rather than real-world clinical settings.
- Agentic AI systems are primarily designed for decision-making and workflow automation but lack robust evaluation of clinical outcomes.
- External tool use and iterative self-correction mechanisms are central to current AI agent architectures.

## Abstract

Artificial Intelligence (AI) agents are rapidly transforming healthcare delivery, enabling real-time decision support and sophisticated patient interaction at scale. However, the scientific landscape of this rapidly growing, multidisciplinary field remains fragmented, with technical innovation outpacing translational research and the establishment of ethical governance frameworks. To address this gap, we conducted a comprehensive scoping review analysis of AI agent research in healthcare.

We followed scoping review methodology (PRISMA-ScR guidelines). Searches across PubMed, Web of Science, arXiv, and medRxiv were conducted from January 2015 to December 7, 2025.

The search identified 1,070 records, of which 43 studies were ultimately included after full-text review. Of these 43 included studies, 36 were published in 2025. Systems were categorized into 8 conversational agents, 17 workflow/automation assistants, and 18 multimodal decision support agents. The core mechanism across all archetypes was external tool use (e.g., retrieval-augmented generation or code execution) for grounding and iterative self-correction (e.g., multi-agent debate or self-debugging loops) for refinement. Evaluation settings were predominantly simulated environments or laboratory studies, with few clinical pilots or real-world deployments. Primary reported outcomes focused on process measures (efficiency) and diagnostic accuracy; clinical outcomes and safety endpoints were rarely addressed.

Agentic AI systems are rapidly evolving from conceptual frameworks to functional prototypes, primarily targeting complex decision-making and workflow automation. While agentic capabilities are increasingly integrated, research heavily favors simulated evaluations. Future research must prioritize clinical trials and the robust assessment of safety, usability, and clinical efficacy before widespread adoption.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12890167/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12890167/full.md

## References

58 references — full list in the complete paper: https://tomesphere.com/paper/PMC12890167/full.md

---
Source: https://tomesphere.com/paper/PMC12890167