Multi-Modal Unsupervised Pre-Training for Surgical Operating Room Workflow Analysis
Muhammad Abdullah Jamal, Omid Mohareri

TL;DR
This paper introduces a novel multi-modal unsupervised learning approach for surgical workflow analysis, leveraging unlabeled robotic surgery data to improve activity recognition and segmentation performance.
Contribution
It proposes a new method to fuse multi-modal data as different views for unsupervised training via clustering, enhancing surgical video analysis.
Findings
Outperforms existing methods in activity recognition.
Achieves superior semantic segmentation results.
Effectively leverages unlabeled data for surgical workflow analysis.
Abstract
Data-driven approaches to assist operating room (OR) workflow analysis depend on large curated datasets that are time consuming and expensive to collect. On the other hand, we see a recent paradigm shift from supervised learning to self-supervised and/or unsupervised learning approaches that can learn representations from unlabeled datasets. In this paper, we leverage the unlabeled data captured in robotic surgery ORs and propose a novel way to fuse the multi-modal data for a single video frame or image. Instead of producing different augmentations (or 'views') of the same image or video frame which is a common practice in self-supervised learning, we treat the multi-modal data as different views to train the model in an unsupervised manner via clustering. We compared our method with other state of the art methods and results show the superior performance of our approach on surgical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Traumatic Brain Injury and Neurovascular Disturbances · Cardiac and Coronary Surgery Techniques
