Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation

Joelle Hanna; Damian Borth

arXiv:2507.06848·cs.CV·July 10, 2025

Know Your Attention Maps: Class-specific Token Masking for Weakly Supervised Semantic Segmentation

Joelle Hanna, Damian Borth

PDF

Open Access

TL;DR

This paper introduces an end-to-end method leveraging Vision Transformer attention maps for weakly supervised semantic segmentation, improving pseudo-mask quality and reducing reliance on detailed annotations.

Contribution

It proposes training a sparse ViT with multiple class-specific [CLS] tokens and a masking strategy to generate accurate pseudo-masks directly from attention maps.

Findings

01

Outperforms existing methods on standard benchmarks

02

Generates pseudo-masks comparable to fully-supervised models

03

Reduces need for detailed pixel-level annotations

Abstract

Weakly Supervised Semantic Segmentation (WSSS) is a challenging problem that has been extensively studied in recent years. Traditional approaches often rely on external modules like Class Activation Maps to highlight regions of interest and generate pseudo segmentation masks. In this work, we propose an end-to-end method that directly utilizes the attention maps learned by a Vision Transformer (ViT) for WSSS. We propose training a sparse ViT with multiple [CLS] tokens (one for each class), using a random masking strategy to promote [CLS] token - class assignment. At inference time, we aggregate the different self-attention maps of each [CLS] token corresponding to the predicted labels to generate pseudo segmentation masks. Our proposed approach enhances the interpretability of self-attention maps and ensures accurate class assignments. Extensive experiments on two standard benchmarks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications