Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation
Wangyu Wu, Tianhong Dai, Zhenhong Chen, Xiaowei Huang, Jimin Xiao, Fei, Ma, Renrong Ouyang

TL;DR
This paper introduces Adaptive Patch Contrast, a ViT-based method for weakly supervised semantic segmentation that improves patch embedding learning and training efficiency, outperforming state-of-the-art methods on standard datasets.
Contribution
The paper proposes a novel ViT-based WSSS method with adaptive pooling and contrastive learning, transforming multi-stage training into an efficient single-stage framework.
Findings
Outperforms state-of-the-art on PASCAL VOC 2012
Achieves better segmentation accuracy with less training time
Enhances patch embedding quality through contrastive learning
Abstract
Weakly Supervised Semantic Segmentation (WSSS) using only image-level labels has gained significant attention due to its cost-effectiveness. The typical framework involves using image-level labels as training data to generate pixel-level pseudo-labels with refinements. Recently, methods based on Vision Transformers (ViT) have demonstrated superior capabilities in generating reliable pseudo-labels, particularly in recognizing complete object regions, compared to CNN methods. However, current ViT-based approaches have some limitations in the use of patch embeddings, being prone to being dominated by certain abnormal patches, as well as many multi-stage methods being time-consuming and lengthy in training, thus lacking efficiency. Therefore, in this paper, we introduce a novel ViT-based WSSS method named \textit{Adaptive Patch Contrast} (APC) that significantly enhances patch embedding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Brain Tumor Detection and Classification
MethodsSoftmax · Attention Is All You Need · Max Pooling · Class-activation map · Contrastive Learning
