Deep learning-based computer vision to recognize and classify suturing   gestures in robot-assisted surgery

Francisco Luongo (1); Ryan Hakim (2); Jessica H. Nguyen (2),; Animashree Anandkumar (3); Andrew J Hung (2) ((1) Department of Biology and; Biological Engineering; Caltech (2) Center for Robotic Simulation &; Education; Catherine & Joseph Aresty Department of Urology; USC Institute of; Urology; University of Southern California (3) Department of Computing &; Mathematical Sciences; Caltech)

arXiv:2008.11833·cs.CV·August 28, 2020

Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery

Francisco Luongo (1), Ryan Hakim (2), Jessica H. Nguyen (2),, Animashree Anandkumar (3), Andrew J Hung (2) ((1) Department of Biology and, Biological Engineering, Caltech (2) Center for Robotic Simulation &, Education, Catherine & Joseph Aresty Department of Urology

PDF

TL;DR

This study develops deep learning computer vision models to automatically identify and classify suturing gestures in robot-assisted surgery, showing high accuracy and potential for surgical skill assessment automation.

Contribution

The paper introduces a deep learning approach for automated recognition and classification of suturing gestures using live surgical videos, advancing surgical skill analysis.

Findings

01

Models reliably predict gesture presence with AUC 0.88

02

Models classify gesture types with AUC 0.87

03

Recurrent model choice does not significantly affect performance

Abstract

Our previous work classified a taxonomy of suturing gestures during a vesicourethral anastomosis of robotic radical prostatectomy in association with tissue tears and patient outcomes. Herein, we train deep-learning based computer vision (CV) to automate the identification and classification of suturing gestures for needle driving attempts. Using two independent raters, we manually annotated live suturing video clips to label timepoints and gestures. Identification (2395 videos) and classification (511 videos) datasets were compiled to train CV models to produce two- and five-class label predictions, respectively. Networks were trained on inputs of raw RGB pixels as well as optical flow for each frame. Each model was trained on 80/20 train/test splits. In this study, all models were able to reliably predict either the presence of a gesture (identification, AUC: 0.88) as well as the type…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.