Surg-SegFormer: A Dual Transformer-Based Model for Holistic Surgical Scene Segmentation

Fatimaelzahraa Ahmed; Muraam Abdel-Ghani; Muhammad Arsalan; Mahmoud Ali; Abdulaziz Al-Ali; Shidin Balakrishnan

arXiv:2507.04304·eess.IV·July 8, 2025

Surg-SegFormer: A Dual Transformer-Based Model for Holistic Surgical Scene Segmentation

Fatimaelzahraa Ahmed, Muraam Abdel-Ghani, Muhammad Arsalan, Mahmoud Ali, Abdulaziz Al-Ali, Shidin Balakrishnan

PDF

TL;DR

Surg-SegFormer is a novel dual transformer-based model designed for real-time, prompt-free surgical scene segmentation, outperforming existing methods and aiding surgical training and analysis.

Contribution

It introduces a prompt-free, dual transformer architecture that achieves state-of-the-art segmentation performance on surgical datasets.

Findings

01

Achieved a mean IoU of 0.80 on EndoVis2018

02

Achieved a mean IoU of 0.54 on EndoVis2017

03

Outperforms current state-of-the-art segmentation models

Abstract

Holistic surgical scene segmentation in robot-assisted surgery (RAS) enables surgical residents to identify various anatomical tissues, articulated tools, and critical structures, such as veins and vessels. Given the firm intraoperative time constraints, it is challenging for surgeons to provide detailed real-time explanations of the operative field for trainees. This challenge is compounded by the scarcity of expert surgeons relative to trainees, making the unambiguous delineation of go- and no-go zones inconvenient. Therefore, high-performance semantic segmentation models offer a solution by providing clear postoperative analyses of surgical procedures. However, recent advanced segmentation models rely on user-generated prompts, rendering them impractical for lengthy surgical videos that commonly exceed an hour. To address this challenge, we introduce Surg-SegFormer, a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.