# Multimodal deep learning for objective skill assessment in robot-assisted vesico-urethral anastomosis

**Authors:** Somayeh B. Shafiei, Saeed Shadpour, Anthony Dakwar, Zhaomin Xu, James L. Mohler

PMC · DOI: 10.1007/s11701-026-03290-z · Journal of Robotic Surgery · 2026-03-10

## TL;DR

This study uses brain and eye-tracking data with deep learning to objectively assess surgical skills during robot-assisted procedures.

## Contribution

A novel multimodal deep learning approach for skill classification in robot-assisted surgery using EEG and eye-tracking data.

## Key findings

- High-density EEG significantly outperformed low-density EEG for needle grasping and positioning.
- Adding eye-tracking data improved classification for needle driving and suture pull-out.
- Multimodal models enabled accurate skill level classification during specific surgical subtasks.

## Abstract

This study aimed to classify robot-assisted surgery (RAS) skill levels (inexperienced, competent, and experienced) during performance of vesico-urethral anastomosis (VUA) using multimodal physiological signals. We trained Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) models on data collected from 23 RAS (RAS trainees and experienced surgeons) performing two VUAs on animal tissue. The dataset included 116-channel electroencephalogram (EEG) and 20 eye-tracking signals recorded during two VUA subtasks: (1) needle grasping, positioning, and entry; and (2) needle driving with wrist rotation and suture pull-out. Skill levels were rated by three raters using the Robotic Anastomosis Competency Evaluation (RACE) tool. Hyperparameters of the models were tuned using grid search with group 4-fold cross-validation on 16 participants and final model performance was evaluated on data from 7 unseen (held-out test) participants, repeated over 10 iterations. Weighted F-scores for classifying skill level using EEG and eye-tracking data were 0.84 for subtask 1 and 0.89 for subtask 2. Using paired t-tests, high-density EEG (116 channels) significantly outperformed low-density EEG (32 channels) for subtask 1 (p = 0.001), with no difference for subtask 2 (p = 0.15). Adding eye-tracking data significantly improved classification for subtask 2 (p = 0.001), but not for subtask 1 (p = 0.5). Multimodal deep learning using EEG and eye-tracking data enabled objective classification of surgical skills during VUA. The benefits of high-density EEG and multimodal integration were task-dependent, underscoring the need to tailor assessment tools to the cognitive and sensorimotor demands of specific surgical subtasks.

The online version contains supplementary material available at 10.1007/s11701-026-03290-z.

## Full-text entities

- **Diseases:** tissue trauma (MESH:D014947), blood loss (MESH:D016063), Cancer (MESH:D009369), RAS (MESH:D000267), LSTM (MESH:D000088562)
- **Chemicals:** TP (-)
- **Species:** Sus scrofa (pig, species) [taxon 9823], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12971764/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12971764/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/PMC12971764/full.md

---
Source: https://tomesphere.com/paper/PMC12971764