MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation
Lian Xu, Mohammed Bennamoun, Farid Boussaid, Hamid Laga, Wanli Ouyang,, Dan Xu

TL;DR
This paper introduces MCTformer+, a transformer-based framework that improves weakly supervised semantic segmentation by generating accurate class-specific localization maps through multiple class tokens and a contrastive learning module.
Contribution
It proposes a novel Multi-Class Token transformer with class-aware training and a Contrastive-Class-Token module to enhance class-specific object localization in WSSS.
Findings
Improved localization maps for WSSS tasks.
Enhanced performance on PASCAL VOC 2012 and MS COCO 2014 datasets.
Seamless integration with CAM method boosts segmentation accuracy.
Abstract
This paper proposes a novel transformer-based framework that aims to enhance weakly supervised semantic segmentation (WSSS) by generating accurate class-specific object localization maps as pseudo labels. Building upon the observation that the attended regions of the one-class token in the standard vision transformer can contribute to a class-agnostic localization map, we explore the potential of the transformer model to capture class-specific attention for class-discriminative object localization by learning multiple class tokens. We introduce a Multi-Class Token transformer, which incorporates multiple class tokens to enable class-aware interactions with the patch tokens. To achieve this, we devise a class-aware training strategy that establishes a one-to-one correspondence between the output class tokens and the ground-truth class labels. Moreover, a Contrastive-Class-Token (CCT)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · COVID-19 diagnosis using AI
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Dense Connections · Residual Connection · Layer Normalization · Vision Transformer
