Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation
Yuhui Yuan, Xiaokang Chen, Xilin Chen, Jingdong Wang

TL;DR
This paper introduces a novel object-contextual representation method for semantic segmentation, leveraging Transformer-based architecture to improve context aggregation and achieve state-of-the-art results on multiple benchmarks.
Contribution
It proposes a Transformer-based object-contextual representation scheme for semantic segmentation, enhancing context understanding and achieving top performance on several datasets.
Findings
Achieves top leaderboard performance on Cityscapes.
Demonstrates competitive results on ADE20K, LIP, PASCAL-Context, and COCO-Stuff.
Validates effectiveness of object-contextual representations with Transformer architecture.
Abstract
In this paper, we address the semantic segmentation problem with a focus on the context aggregation strategy. Our motivation is that the label of a pixel is the category of the object that the pixel belongs to. We present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, % the representation similarity we compute the relation between each pixel and each object region and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations according to their relations with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsAverage Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling
