TraSeTR: Track-to-Segment Transformer with Contrastive Query for   Instance-level Instrument Segmentation in Robotic Surgery

Zixu Zhao; Yueming Jin; Pheng-Ann Heng

arXiv:2202.08453·cs.CV·February 18, 2022

TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery

Zixu Zhao, Yueming Jin, Pheng-Ann Heng

PDF

Open Access

TL;DR

TraSeTR is a novel transformer-based approach that improves instrument segmentation in robotic surgery by integrating tracking cues and contrastive learning, achieving state-of-the-art results on multiple datasets.

Contribution

Introduces TraSeTR, a Track-to-Segment Transformer that leverages temporal tracking and contrastive query learning for enhanced instrument segmentation in surgical videos.

Findings

01

Achieves state-of-the-art segmentation accuracy on three public datasets.

02

Effectively handles large temporal variations with contrastive query learning.

03

Demonstrates superior performance over previous methods in instrument type discrimination.

Abstract

Surgical instrument segmentation -- in general a pixel classification task -- is fundamentally crucial for promoting cognitive intelligence in robot-assisted surgery (RAS). However, previous methods are struggling with discriminating instrument types and instances. To address the above issues, we explore a mask classification paradigm that produces per-segment predictions. We propose TraSeTR, a novel Track-to-Segment Transformer that wisely exploits tracking cues to assist surgical instrument segmentation. TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions i.e., a set of class-bbox-mask pairs, by decoding query embeddings. Specifically, we introduce the prior query that encoded with previous temporal knowledge, to transfer tracking signals to current instances via identity matching. A contrastive query learning strategy is further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Medical Image Segmentation Techniques · Intraocular Surgery and Lenses

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Label Smoothing · Position-Wise Feed-Forward Layer