# Towards Surgical Context Inference and Translation to Gestures

**Authors:** Kay Hutchinson, Zongyu Li, Ian Reyes, Homa Alemzadeh

arXiv: 2302.14237 · 2023-03-17

## TL;DR

This paper introduces an automated, explainable method for generating surgical gesture transcripts from image segmentation data, reducing manual labeling effort and improving robustness in robot-assisted surgery analysis.

## Contribution

It presents a novel approach combining segmentation, context detection, and gesture translation using FSM and LSTM models, with state-of-the-art segmentation performance and reduced labeling time.

## Key findings

- Segmentation models achieve state-of-the-art in needle and thread recognition.
- High agreement in detecting surgical states with crowd-sourced labels.
- FSM models are more robust than LSTMs to poor segmentation and labeling.

## Abstract

Manual labeling of gestures in robot-assisted surgery is labor intensive, prone to errors, and requires expertise or training. We propose a method for automated and explainable generation of gesture transcripts that leverages the abundance of data for image segmentation. Surgical context is detected using segmentation masks by examining the distances and intersections between the tools and objects. Next, context labels are translated into gesture transcripts using knowledge-based Finite State Machine (FSM) and data-driven Long Short Term Memory (LSTM) models. We evaluate the performance of each stage of our method by comparing the results with the ground truth segmentation masks, the consensus context labels, and the gesture labels in the JIGSAWS dataset. Our results show that our segmentation models achieve state-of-the-art performance in recognizing needle and thread in Suturing and we can automatically detect important surgical states with high agreement with crowd-sourced labels (e.g., contact between graspers and objects in Suturing). We also find that the FSM models are more robust to poor segmentation and labeling performance than LSTMs. Our proposed method can significantly shorten the gesture labeling process (~2.8 times).

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14237/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14237/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/2302.14237/full.md

---
Source: https://tomesphere.com/paper/2302.14237