TraSeTR: Track-to-Segment Transformer with Contrastive Query for Instance-level Instrument Segmentation in Robotic Surgery
Zixu Zhao, Yueming Jin, Pheng-Ann Heng

TL;DR
TraSeTR is a novel transformer-based approach that improves instrument segmentation in robotic surgery by integrating tracking cues and contrastive learning, achieving state-of-the-art results on multiple datasets.
Contribution
Introduces TraSeTR, a Track-to-Segment Transformer that leverages temporal tracking and contrastive query learning for enhanced instrument segmentation in surgical videos.
Findings
Achieves state-of-the-art segmentation accuracy on three public datasets.
Effectively handles large temporal variations with contrastive query learning.
Demonstrates superior performance over previous methods in instrument type discrimination.
Abstract
Surgical instrument segmentation -- in general a pixel classification task -- is fundamentally crucial for promoting cognitive intelligence in robot-assisted surgery (RAS). However, previous methods are struggling with discriminating instrument types and instances. To address the above issues, we explore a mask classification paradigm that produces per-segment predictions. We propose TraSeTR, a novel Track-to-Segment Transformer that wisely exploits tracking cues to assist surgical instrument segmentation. TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions i.e., a set of class-bbox-mask pairs, by decoding query embeddings. Specifically, we introduce the prior query that encoded with previous temporal knowledge, to transfer tracking signals to current instances via identity matching. A contrastive query learning strategy is further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Medical Image Segmentation Techniques · Intraocular Surgery and Lenses
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Label Smoothing · Position-Wise Feed-Forward Layer
