Self-supervised Learning via Cluster Distance Prediction for Operating   Room Context Awareness

Idris Hamoud; Alexandros Karargyris; Aidean Sharghi; Omid Mohareri,; Nicolas Padoy

arXiv:2407.05448·cs.CV·July 9, 2024

Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness

Idris Hamoud, Alexandros Karargyris, Aidean Sharghi, Omid Mohareri,, Nicolas Padoy

PDF

TL;DR

This paper introduces a novel 3D self-supervised learning method using ToF camera data to improve semantic segmentation and activity classification in operating rooms, reducing reliance on annotated data.

Contribution

The paper presents a new 3D self-supervised task based on predicting relative 3D distances in OR scenes, enhancing feature learning for clinical workflow understanding.

Findings

01

Significant performance improvements on two clinical datasets.

02

Enhanced effectiveness in low-annotation regimes.

03

Demonstrated utility of 3D spatial context learning.

Abstract

Semantic segmentation and activity classification are key components to creating intelligent surgical systems able to understand and assist clinical workflow. In the Operating Room, semantic segmentation is at the core of creating robots aware of clinical surroundings, whereas activity classification aims at understanding OR workflow at a higher level. State-of-the-art semantic segmentation and activity recognition approaches are fully supervised, which is not scalable. Self-supervision can decrease the amount of annotated data needed. We propose a new 3D self-supervised task for OR scene understanding utilizing OR scene images captured with ToF cameras. Contrary to other self-supervised approaches, where handcrafted pretext tasks are focused on 2D image features, our proposed task consists of predicting the relative 3D distance of image patches by exploiting the depth maps. Learning 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttentive Walk-Aggregating Graph Neural Network