# From gaze to proficiency: deep learning-driven prediction of novice performance in laparoscopic training using AOI-dependent metrics

**Authors:** Aseel F. Khanfar, Sanaz Motamedi, Shawn D. Safford, Jason Moore, Jessica Menold, Scarlett Miller

PMC · DOI: 10.1007/s00464-025-12369-x · Surgical Endoscopy · 2025-12-05

## TL;DR

This study uses deep learning and eye-tracking to predict the skill levels of surgical trainees during laparoscopic training, showing that visual behavior can be classified and adapted for different anatomies.

## Contribution

The study introduces AOI-dependent metrics and integrates CV-DL with eye-tracking for real-time skill prediction in laparoscopic training.

## Key findings

- AOI-dependent and motion metrics successfully classified novices into high and mid-low skill levels.
- Random Forest achieved the highest accuracy in predicting visual behavior using fixation rates and tool speed.
- Visual attention patterns were consistent between pediatric and adult box trainers among novices.

## Abstract

The fundamentals of laparoscopic surgery (FLS) program uses box trainers to develop laparoscopic skills. However, these simulators lack personalized training, real-time objective assessment, and primarily represent adult anatomies, neglecting pediatric cases. To address these limitations, advanced objective evaluations like motion analysis and eye-tracking are needed to track trainees’ progress and provide real-time formative feedback. However, dynamic training environments challenge eye-tracking data extraction due to shifting areas of interest (AOI). This study aimed to extract AOI-dependent and motion metrics for differentiating and predicting trainees’ skill levels across different box trainer anatomies.

Medical students and residents performed the peg transfer task on adult and pediatric box trainers. Computer Vision-Deep Learning (CV-DL) algorithms were integrated with eye-tracking data to automatically detect AOIs and extract AOI-dependent (fixation rates on objects and tools) and motion (tool speed) metrics. K-means clustering was used to differentiate trainees’ skill levels. To predict trainees’ visual behavior, we employed multiple Machine Learning (ML) techniques, including Random Forest, Support Vector Machine, Artificial Neural Networks, and Decision Trees. These methods were used to evaluate which technique could most accurately predict trainees’ visual attention patterns.

The extracted metrics successfully classified novices into High and Mid-Low skill levels, with significant differences in all extracted metrics between visual behavior levels (p < 0.05). Random Forest achieved the highest accuracy for visual behavior prediction, highlighting the importance of fixation rates on objects and tool speed as key predictors using Gini importance. Results showed consistency in novices’ visual attention between pediatric and adult box trainers (p > 0.05).

The findings from this work are significant, indicating that novices' skill levels may differ even in their early-stage training, and extracted metrics have the potential to classify and predict novices’ skill levels and visual behavior. This is important for customizing and adapting trainees’ training programs to enhance their performance.

## Full-text entities

- **Diseases:** DL (MESH:C537113), postoperative pain (MESH:D010149)
- **Chemicals:** diamond (MESH:D018130)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12971865/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12971865/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/PMC12971865/full.md

---
Source: https://tomesphere.com/paper/PMC12971865