Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets
Muhammad Abdullah Jamal, Omid Mohareri

TL;DR
This paper introduces SurgDepth, a simple multi-modal fusion framework based on Vision Transformers for surgical scene semantic segmentation, achieving state-of-the-art results across multiple datasets.
Contribution
SurgDepth is a novel RGB-D fusion method using ViTs that outperforms existing approaches by effectively encoding both modalities with a simple fusion mechanism.
Findings
Achieves SOTA IoU of 0.86 on EndoVis 2022 dataset.
Outperforms previous methods by at least 4%.
Uses a shallow, compute-efficient decoder with ConvNeXt blocks.
Abstract
Surgical scene understanding is a key technical component for enabling intelligent and context aware systems that can transform various aspects of surgical interventions. In this work, we focus on the semantic segmentation task, propose a simple yet effective multi-modal (RGB and depth) training framework called SurgDepth, and show state-of-the-art (SOTA) results on all publicly available datasets applicable for this task. Unlike previous approaches, which either fine-tune SOTA segmentation models trained on natural images, or encode RGB or RGB-D information using RGB only pre-trained backbones, SurgDepth, which is built on top of Vision Transformers (ViTs), is designed to encode both RGB and depth information through a simple fusion mechanism. We conduct extensive experiments on benchmark datasets including EndoVis2022, AutoLapro, LapI2I and EndoVis2017 to verify the efficacy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Medical Imaging and Analysis · AI in cancer detection
MethodsConvNeXt · Attentive Walk-Aggregating Graph Neural Network · Focus
