Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness
Idris Hamoud, Alexandros Karargyris, Aidean Sharghi, Omid Mohareri,, Nicolas Padoy

TL;DR
This paper introduces a novel 3D self-supervised learning method using ToF camera data to improve semantic segmentation and activity classification in operating rooms, reducing reliance on annotated data.
Contribution
The paper presents a new 3D self-supervised task based on predicting relative 3D distances in OR scenes, enhancing feature learning for clinical workflow understanding.
Findings
Significant performance improvements on two clinical datasets.
Enhanced effectiveness in low-annotation regimes.
Demonstrated utility of 3D spatial context learning.
Abstract
Semantic segmentation and activity classification are key components to creating intelligent surgical systems able to understand and assist clinical workflow. In the Operating Room, semantic segmentation is at the core of creating robots aware of clinical surroundings, whereas activity classification aims at understanding OR workflow at a higher level. State-of-the-art semantic segmentation and activity recognition approaches are fully supervised, which is not scalable. Self-supervision can decrease the amount of annotated data needed. We propose a new 3D self-supervised task for OR scene understanding utilizing OR scene images captured with ToF cameras. Contrary to other self-supervised approaches, where handcrafted pretext tasks are focused on 2D image features, our proposed task consists of predicting the relative 3D distance of image patches by exploiting the depth maps. Learning 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttentive Walk-Aggregating Graph Neural Network
