Recognition and Prediction of Surgical Gestures and Trajectories Using Transformer Models in Robot-Assisted Surgery
Chang Shi, Yi Zheng, Ann Majewicz Fey

TL;DR
This paper introduces a novel Transformer-based approach for recognizing and predicting surgical gestures and trajectories in robot-assisted surgery, achieving high accuracy and low error using only kinematic data, enabling real-time applications.
Contribution
The paper adapts Transformer models for surgical gesture and trajectory recognition and prediction, demonstrating improved performance with only kinematic data in RAS.
Findings
Achieved up to 89.3% gesture recognition accuracy
Attained 84.6% gesture prediction accuracy (1 second ahead)
Achieved 2.71mm trajectory prediction error (1 second ahead)
Abstract
Surgical activity recognition and prediction can help provide important context in many Robot-Assisted Surgery (RAS) applications, for example, surgical progress monitoring and estimation, surgical skill evaluation, and shared control strategies during teleoperation. Transformer models were first developed for Natural Language Processing (NLP) to model word sequences and soon the method gained popularity for general sequence modeling tasks. In this paper, we propose the novel use of a Transformer model for three tasks: gesture recognition, gesture prediction, and trajectory prediction during RAS. We modify the original Transformer architecture to be able to generate the current gesture sequence, future gesture sequence, and future trajectory sequence estimations using only the current kinematic data of the surgical robot end-effectors. We evaluate our proposed models on the JHU-ISI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Anatomy and Medical Technology · Stroke Rehabilitation and Recovery
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Linear Layer · Dense Connections · Adam
