SOTR: Segmenting Objects with Transformers

Ruohao Guo; Dantong Niu; Liao Qu; Zhenbo Li

arXiv:2108.06747·cs.CV·August 18, 2021

SOTR: Segmenting Objects with Transformers

Ruohao Guo, Dantong Niu, Liao Qu, Zhenbo Li

PDF

1 Repo

TL;DR

SOTR introduces a transformer-based instance segmentation model that simplifies the pipeline, efficiently captures context, and outperforms existing methods on the MS COCO dataset.

Contribution

The paper presents a novel transformer-based framework for instance segmentation that is flexible, efficient, and improves accuracy over prior approaches.

Findings

01

SOTR achieves superior performance on MS COCO dataset.

02

The twin transformer reduces computational resources while maintaining accuracy.

03

SOTR can be integrated with various CNN backbones for enhanced results.

Abstract

Most recent transformer-based models show impressive performance on vision tasks, even better than Convolution Neural Networks (CNN). In this work, we present a novel, flexible, and effective transformer-based model for high-quality instance segmentation. The proposed method, Segmenting Objects with TRansformers (SOTR), simplifies the segmentation pipeline, building on an alternative CNN backbone appended with two parallel subtasks: (1) predicting per-instance category via transformer and (2) dynamically generating segmentation mask with the multi-level upsampling module. SOTR can effectively extract lower-level feature representations and capture long-range context dependencies by Feature Pyramid Network (FPN) and twin transformer, respectively. Meanwhile, compared with the original transformer, the proposed twin transformer is time- and resource-efficient since only a row and a column…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

easton-cau/SOTR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections · Softmax