Semantic Segmentation by Early Region Proxy

Yifan Zhang; Bo Pang; Cewu Lu

arXiv:2203.14043·cs.CV·March 29, 2022

Semantic Segmentation by Early Region Proxy

Yifan Zhang, Bo Pang, Cewu Lu

PDF

Open Access 1 Repo

TL;DR

This paper introduces RegProxy, a novel region-based Transformer model for semantic segmentation that predicts at the region level, achieving superior performance and efficiency compared to traditional dense prediction methods.

Contribution

It proposes a region proxy approach that models image regions with learnable, flexible geometries and encodes them using Transformer self-attention, eliminating the need for dense pixel-wise prediction.

Findings

01

Outperforms CNN models with fewer parameters and less computation.

02

Achieves 52.9 mIoU on ADE20K, surpassing state-of-the-art.

03

Demonstrates a superior performance-efficiency trade-off.

Abstract

Typical vision backbones manipulate structured features. As a compromise, semantic segmentation has long been modeled as per-point prediction on dense regular grids. In this work, we present a novel and efficient modeling that starts from interpreting the image as a tessellation of learnable regions, each of which has flexible geometrics and carries homogeneous semantics. To model region-wise context, we exploit Transformer to encode regions in a sequence-to-sequence manner by applying multi-layer self-attention on the region embeddings, which serve as proxies of specific regions. Semantic segmentation is now carried out as per-region prediction on top of the encoded region embeddings using a single linear classifier, where a decoder is no longer needed. The proposed RegProxy model discards the common Cartesian feature layout and operates purely at region level. Hence, it exhibits the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yif-zhang/regionproxy
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Residual Connection · Softmax · Dropout · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding