SOIT: Segmenting Objects with Instance-Aware Transformers

Xiaodong Yu; Dahu Shi; Xing Wei; Ye Ren; Tingqun Ye; Wenming Tan

arXiv:2112.11037·cs.CV·December 24, 2021

SOIT: Segmenting Objects with Instance-Aware Transformers

Xiaodong Yu, Dahu Shi, Xing Wei, Ye Ren, Tingqun Ye, Wenming Tan

PDF

Open Access 1 Repo 1 Video

TL;DR

SOIT is a novel end-to-end instance segmentation framework using instance-aware transformers that eliminates the need for traditional components like RoI cropping and NMS, achieving superior results on MS COCO.

Contribution

It introduces a single-stage, RoI- and NMS-free instance segmentation method based on set prediction with transformers, improving efficiency and accuracy.

Findings

01

Outperforms state-of-the-art on MS COCO

02

Eliminates need for RoI and NMS components

03

Joint learning improves detection performance

Abstract

This paper presents an end-to-end instance segmentation framework, termed SOIT, that Segments Objects with Instance-aware Transformers. Inspired by DETR \cite{carion2020end}, our method views instance segmentation as a direct set prediction problem and effectively removes the need for many hand-crafted components like RoI cropping, one-to-many label assignment, and non-maximum suppression (NMS). In SOIT, multiple queries are learned to directly reason a set of object embeddings of semantic category, bounding-box location, and pixel-wise mask in parallel under the global image context. The class and bounding-box can be easily embedded by a fixed-length vector. The pixel-wise mask, especially, is embedded by a group of parameters to construct a lightweight instance-aware transformer. Afterward, a full-resolution mask is produced by the instance-aware transformer without involving any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuxiaodonghri/soit
noneOfficial

Videos

SOIT: Segmenting Objects with Instance-Aware Transformers· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Adam · Layer Normalization · Convolution · Feedforward Network · Absolute Position Encodings