Dual Progressive Transformations for Weakly Supervised Semantic   Segmentation

Dongjian Huo; Yukun Su; Qingyao Wu

arXiv:2209.15211·cs.CV·October 3, 2022

Dual Progressive Transformations for Weakly Supervised Semantic Segmentation

Dongjian Huo, Yukun Su, Qingyao Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces CRT, a novel CNN-refined transformer approach that improves weakly supervised semantic segmentation by generating more accurate class activation maps, achieving state-of-the-art results on benchmark datasets.

Contribution

The paper proposes a CNN-refined transformer (CRT) that effectively combines global attention and local accuracy for WSSS, addressing over-activation issues of pure transformers.

Findings

01

CRT achieves new state-of-the-art performance on PASCAL VOC 2012.

02

CRT outperforms previous methods significantly in weakly supervised object localization.

03

Extensive experiments validate the effectiveness of the proposed approach.

Abstract

Weakly supervised semantic segmentation (WSSS), which aims to mine the object regions by merely using class-level labels, is a challenging task in computer vision. The current state-of-the-art CNN-based methods usually adopt Class-Activation-Maps (CAMs) to highlight the potential areas of the object, however, they may suffer from the part-activated issues. To this end, we try an early attempt to explore the global feature attention mechanism of vision transformer in WSSS task. However, since the transformer lacks the inductive bias as in CNN models, it can not boost the performance directly and may yield the over-activated problems. To tackle these drawbacks, we propose a Convolutional Neural Networks Refined Transformer (CRT) to mine a globally complete and locally accurate class activation maps in this paper. To validate the effectiveness of our proposed method, extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huodongjian0603/crt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Multi-Head Attention · Adam · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Layer Normalization