ISTR: End-to-End Instance Segmentation with Transformers

Jie Hu; Liujuan Cao; Yao Lu; ShengChuan Zhang; Yan Wang; Ke Li; Feiyue; Huang; Ling Shao; Rongrong Ji

arXiv:2105.00637·cs.CV·May 7, 2021·54 cites

ISTR: End-to-End Instance Segmentation with Transformers

Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wang, Ke Li, Feiyue, Huang, Ling Shao, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

ISTR introduces the first end-to-end transformer-based framework for instance segmentation, predicting mask embeddings and using recurrent refinement to achieve state-of-the-art results on MS COCO.

Contribution

It proposes a novel end-to-end instance segmentation transformer that predicts mask embeddings and employs a recurrent refinement strategy, eliminating the need for non-end-to-end components.

Findings

01

Achieves 46.8/38.6 box/mask AP with ResNet50-FPN on COCO

02

Achieves 48.1/39.9 box/mask AP with ResNet101-FPN on COCO

03

Demonstrates strong potential as a baseline for instance-level recognition

Abstract

End-to-end paradigms significantly improve the accuracy of various deep-learning-based computer vision models. To this end, tasks like object detection have been upgraded by replacing non-end-to-end components, such as removing non-maximum suppression by training with a set loss based on bipartite matching. However, such an upgrade is not applicable to instance segmentation, due to its significantly higher output dimensions compared to object detection. In this paper, we propose an instance segmentation Transformer, termed ISTR, which is the first end-to-end framework of its kind. ISTR predicts low-dimensional mask embeddings, and matches them with ground truth mask embeddings for the set loss. Besides, ISTR concurrently conducts detection and segmentation with a recurrent refinement strategy, which provides a new way to achieve instance segmentation compared to the existing top-down…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hujiecpp/ISTR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Multi-Head Attention · Adam · Layer Normalization · Residual Connection · Label Smoothing · Byte Pair Encoding · Dropout