TL;DR
This paper introduces a lightweight deep learning model with auxiliary supervision and adversarial training for real-time segmentation of surgical instruments in robotic surgery videos, enhancing accuracy and speed.
Contribution
It presents a novel cascaded CNN with multi-resolution feature fusion, auxiliary and adversarial loss, and spatial pyramid pooling for improved real-time instrument segmentation.
Findings
Outperforms existing algorithms in segmentation accuracy
Achieves faster prediction times on high-resolution videos
Effectively learns structural information through adversarial loss
Abstract
Robot-assisted surgery is an emerging technology which has undergone rapid growth with the development of robotics and imaging systems. Innovations in vision, haptics and accurate movements of robot arms have enabled surgeons to perform precise minimally invasive surgeries. Real-time semantic segmentation of the robotic instruments and tissues is a crucial step in robot-assisted surgery. Accurate and efficient segmentation of the surgical scene not only aids in the identification and tracking of instruments but also provided contextual information about the different tissues and instruments being operated with. For this purpose, we have developed a light-weight cascaded convolutional neural network (CNN) to segment the surgical instruments from high-resolution videos obtained from a commercial robotic system. We propose a multi-resolution feature fusion module (MFF) to fuse the feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSpatial Pyramid Pooling
