SDTP: Semantic-aware Decoupled Transformer Pyramid for Dense Image   Prediction

Zekun Li; Yufan Liu; Bing Li; Weiming Hu; Kebin Wu; Pei Wang

arXiv:2109.08963·cs.CV·September 21, 2021

SDTP: Semantic-aware Decoupled Transformer Pyramid for Dense Image Prediction

Zekun Li, Yufan Liu, Bing Li, Weiming Hu, Kebin Wu, Pei Wang

PDF

Open Access

TL;DR

This paper introduces SDTP, a novel transformer pyramid architecture that enhances multi-scale dense image prediction by exploiting semantic diversity and efficient cross-level interaction, outperforming existing methods.

Contribution

The paper proposes a new Semantic-aware Decoupled Transformer Pyramid with three key components, improving multi-scale feature interaction and semantic diversity handling in dense prediction tasks.

Findings

01

Outperforms state-of-the-art methods in dense image prediction.

02

Components are plug-and-play and adaptable to other models.

03

Effectively models multi-scale semantic interactions with reduced computation.

Abstract

Although transformer has achieved great progress on computer vision tasks, the scale variation in dense image prediction is still the key challenge. Few effective multi-scale techniques are applied in transformer and there are two main limitations in the current methods. On one hand, self-attention module in vanilla transformer fails to sufficiently exploit the diversity of semantic information because of its rigid mechanism. On the other hand, it is hard to build attention and interaction among different levels due to the heavy computational burden. To alleviate this problem, we first revisit multi-scale problem in dense prediction, verifying the significance of diverse semantic representation and multi-scale interaction, and exploring the adaptation of transformer to pyramidal structure. Inspired by these findings, we propose a novel Semantic-aware Decoupled Transformer Pyramid (SDTP)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Dense Connections · Label Smoothing · Multi-Head Attention · Byte Pair Encoding · Softmax