Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets

Muhammad Abdullah Jamal; Omid Mohareri

arXiv:2407.19714·cs.CV·July 30, 2024

Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets

Muhammad Abdullah Jamal, Omid Mohareri

PDF

Open Access

TL;DR

This paper introduces SurgDepth, a simple multi-modal fusion framework based on Vision Transformers for surgical scene semantic segmentation, achieving state-of-the-art results across multiple datasets.

Contribution

SurgDepth is a novel RGB-D fusion method using ViTs that outperforms existing approaches by effectively encoding both modalities with a simple fusion mechanism.

Findings

01

Achieves SOTA IoU of 0.86 on EndoVis 2022 dataset.

02

Outperforms previous methods by at least 4%.

03

Uses a shallow, compute-efficient decoder with ConvNeXt blocks.

Abstract

Surgical scene understanding is a key technical component for enabling intelligent and context aware systems that can transform various aspects of surgical interventions. In this work, we focus on the semantic segmentation task, propose a simple yet effective multi-modal (RGB and depth) training framework called SurgDepth, and show state-of-the-art (SOTA) results on all publicly available datasets applicable for this task. Unlike previous approaches, which either fine-tune SOTA segmentation models trained on natural images, or encode RGB or RGB-D information using RGB only pre-trained backbones, SurgDepth, which is built on top of Vision Transformers (ViTs), is designed to encode both RGB and depth information through a simple fusion mechanism. We conduct extensive experiments on benchmark datasets including EndoVis2022, AutoLapro, LapI2I and EndoVis2017 to verify the efficacy of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Medical Imaging and Analysis · AI in cancer detection

MethodsConvNeXt · Attentive Walk-Aggregating Graph Neural Network · Focus