Searching for Efficient Architecture for Instrument Segmentation in   Robotic Surgery

Daniil Pakhomov; Nassir Navab

arXiv:2007.04449·cs.CV·July 10, 2020

Searching for Efficient Architecture for Instrument Segmentation in Robotic Surgery

Daniil Pakhomov, Nassir Navab

PDF

TL;DR

This paper introduces a lightweight, efficient deep residual network optimized for real-time high-resolution instrument segmentation in robotic surgery, balancing speed and accuracy.

Contribution

The authors propose a novel differentiable architecture search method for residual networks, achieving state-of-the-art real-time segmentation performance in surgical images.

Findings

01

Achieves up to 125 FPS on high-resolution images

02

Outperforms existing methods in speed-accuracy tradeoff

03

Validated on EndoVis 2017 dataset

Abstract

Segmentation of surgical instruments is an important problem in robot-assisted surgery: it is a crucial step towards full instrument pose estimation and is directly used for masking of augmented reality overlays during surgical procedures. Most applications rely on accurate real-time segmentation of high-resolution surgical images. While previous research focused primarily on methods that deliver high accuracy segmentation masks, majority of them can not be used for real-time applications due to their computational cost. In this work, we design a light-weight and highly-efficient deep residual architecture which is tuned to perform real-time inference of high-resolution images. To account for reduced accuracy of the discovered light-weight deep residual network and avoid adding any additional computational burden, we perform a differentiable search over dilation rates for residual units…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings