MoRe: Class Patch Attention Needs Regularization for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Yucong Meng, Kexue Fu, Shuo Wang, Zhijian Song

TL;DR
This paper introduces MoRe, a regularization framework for class patch attention in weakly supervised semantic segmentation using ViT, which reduces artifacts and improves localization accuracy.
Contribution
MoRe proposes novel graph-based and localization-informed regularizations to enhance class-patch attention modeling in WSSS with ViT, addressing artifact issues and achieving state-of-the-art results.
Findings
MoRe significantly reduces false activations in localization maps.
It outperforms recent methods on PASCAL VOC and MS COCO datasets.
The approach effectively regularizes class-patch attention for better segmentation.
Abstract
Weakly Supervised Semantic Segmentation (WSSS) with image-level labels typically uses Class Activation Maps (CAM) to achieve dense predictions. Recently, Vision Transformer (ViT) has provided an alternative to generate localization maps from class-patch attention. However, due to insufficient constraints on modeling such attention, we observe that the Localization Attention Maps (LAM) often struggle with the artifact issue, i.e., patch regions with minimal semantic relevance are falsely activated by class tokens. In this work, we propose MoRe to address this issue and further explore the potential of LAM. Our findings suggest that imposing additional regularization on class-patch attention is necessary. To this end, we first view the attention as a novel directed graph and propose the Graph Category Representation module to implicitly regularize the interaction among class-patch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Adam · Vision Transformer · Layer Normalization · Dropout · Class-activation map · Position-Wise Feed-Forward Layer · Label Smoothing · Dense Connections
