# Automated and Artificial Intelligence (AI)-Derived Performance Assessment in Surgical Simulation: A Systematic Review

**Authors:** Ahmad Khalifa, Owais Tahhan, Mohammed Albazooni, Mohammed Saeed, Ruha Hamdi, Megan Stanners, Amman Malik, Adnan Malik

PMC · DOI: 10.7759/cureus.100477 · Cureus · 2025-12-31

## TL;DR

This paper reviews how AI and automation are used to assess surgical skills in training and finds that while promising, methods lack consistency and validation.

## Contribution

The study systematically evaluates the use of AI in surgical performance assessment, highlighting gaps in standardization and validation.

## Key findings

- Most studies used supervised learning algorithms for technical skill assessment.
- Performance measures varied widely, with limited documentation of validity and reliability.
- Few studies included real-time adaptive feedback systems.

## Abstract

Artificial intelligence (AI)-assisted and automated performance assessment is increasingly being incorporated into surgical education, yet the degree, efficacy, and trustworthiness of this performance assessment are unknown. A systematic review of the literature published between 2010 and 2025 on PubMed, Scopus, Embase, and IEEE Xplore (including conference proceedings and gray literature) was completed to identify experimental and observational studies that reported the use of algorithms or automated methods of assessment in technical skill assessment in both simulator-based training and real clinical practice. Information was extracted on study characteristics, algorithm type, task complexity, performance measures, evidence of validity and reliability, and the quality of studies, assessed using relevant tools, with results reported descriptively and trends reviewed by study type, year, country, and task domain. Twenty-nine studies met the inclusion criteria, with most using supervised learning algorithms to evaluate technical skills; performance measures ranged widely, and studies were inconsistent in documenting the validity and reliability of the assessments. Few studies presented a real-time adaptive feedback system. The literature reflects a shift toward simulation-based assessment and the increasing use of multimodal data sources. Still, methodological heterogeneity, poor transparency, and a lack of validity and reliability prevent generalisability across tasks and contexts. AI-based assessment has potential in surgical education, with real-time adaptive assessment particularly interesting; however, standardised, validated methods are required to ensure reproducible measures of performance and ethical implementation in surgical simulation training.

## Full-text entities

- **Diseases:** brain tumour (MESH:D001932), AI (MESH:C538142)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12807625/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12807625/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/PMC12807625/full.md

---
Source: https://tomesphere.com/paper/PMC12807625