# Development of an artificial intelligence algorithm for automated surgical gestures annotation

**Authors:** Rikke Groth Olsen, Flemming Bjerrum, Annarita Ghosh Andersen, Lars Konge, Andreas Røder, Morten Bo Søndergaard Svendsen

PMC · DOI: 10.1007/s11701-025-02556-2 · 2025-07-18

## TL;DR

This paper presents an AI algorithm that automatically labels surgical gestures in robot-assisted prostatectomy simulations, reducing the need for manual annotation.

## Contribution

A novel recurrent neural network model for automated surgical gesture annotation using a VisionTransformer and LSTM architecture.

## Key findings

- The model achieved an AUC of 0.95 and an F1-score of 0.71 for surgical gesture classification.
- High classification accuracy (0.84–0.97) and specificity (0.90–0.99) were observed, though sensitivity was lower (0.62–0.81).
- Total Agreement scores ranged from 0.72 to 0.91, indicating strong performance across gesture classes.

## Abstract

Surgical gestures analysis is a promising method to assess surgical procedure quality, but manual annotation is time-consuming. We aimed to develop a recurrent neural network for automated surgical gesture annotation using simulated robot-assisted radical prostatectomies. We have previously manually annotated 161 videos with five different surgical gestures (Regular dissection, Hemostatic control, Clip application, Needle handling, and Suturing). We created a model consisting of two neural networks: a pre-trained feature extractor (VisionTransformer using Imagenet) and a classification head (recurrent neural network with a Long Short-Term Memory (LSTM(128) and fully connected layer)). The data set was split into a training + validation set and a test set. The trained model labeled input sequences with one of the five surgical gestures. The overall performance of the neural networks was assessed by metrics for multi-label classification and defined Total Agreement, an extended version of Intersection over Union (IoU). Our neural network could predict the class of surgical gestures with an Area Under the Curve (AUC) of 0.95 (95% CI 0.93–0.96) and an F1-score of 0.71 (95% CI 0.67–0.75). The network could classify each surgical gesture with high accuracies (0.84–0.97) and high specificities (0.90–0.99), but with lower sensitivities (0.62–0.81). The average Total Agreement for each gesture class was between 0.72 (95% CI ± 0.03) and 0.91 (95% CI ± 0.02). We successfully developed a high-performing neural network to analyze gestures in simulated surgical procedures. Our next step is to use the network to annotate videos and evaluate their efficacy in predicting patient outcomes.

The online version contains supplementary material available at 10.1007/s11701-025-02556-2.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12274238/full.md

---
Source: https://tomesphere.com/paper/PMC12274238