Multimodal and self-supervised representation learning for automatic   gesture recognition in surgical robotics

Aniruddha Tamhane; Jie Ying Wu; Mathias Unberath

arXiv:2011.00168·cs.CV·November 3, 2020

Multimodal and self-supervised representation learning for automatic gesture recognition in surgical robotics

Aniruddha Tamhane, Jie Ying Wu, Mathias Unberath

PDF

Open Access

TL;DR

This paper introduces a self-supervised, multi-modal learning approach for surgical gesture recognition that combines video and kinematic data, improving understanding and reducing reliance on expert annotations.

Contribution

It develops a novel encoder-decoder framework for learning representations from surgical videos and kinematics, demonstrating effectiveness across multiple tasks.

Findings

01

Gesture recognition accuracy between 69.6% and 77.8%.

02

Transfer learning accuracy between 44.6% and 64.8%.

03

Surgeon skill classification accuracy between 76.8% and 81.2%.

Abstract

Self-supervised, multi-modal learning has been successful in holistic representation of complex scenarios. This can be useful to consolidate information from multiple modalities which have multiple, versatile uses. Its application in surgical robotics can lead to simultaneously developing a generalised machine understanding of the surgical process and reduce the dependency on quality, expert annotations which are generally difficult to obtain. We develop a self-supervised, multi-modal representation learning paradigm that learns representations for surgical gestures from video and kinematics. We use an encoder-decoder network configuration that encodes representations from surgical videos and decodes them to yield kinematics. We quantitatively demonstrate the efficacy of our learnt representations for gesture recognition (with accuracy between 69.6 % and 77.8 %), transfer learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems