Context Patch Fusion With Class Token Enhancement for Weakly Supervised Semantic Segmentation
Yiyang Fu, Hui Li, Wangyu Wu

TL;DR
This paper introduces CPF-CTE, a novel framework for weakly supervised semantic segmentation that leverages contextual patch relations and class token enhancements to improve feature representation and segmentation accuracy.
Contribution
The proposed CPF-CTE framework uniquely combines bidirectional LSTM-based contextual fusion with learnable class tokens to better capture spatial and semantic dependencies in WSSS.
Findings
Outperforms previous WSSS methods on PASCAL VOC 2012
Achieves higher segmentation accuracy on MS COCO 2014
Enhances feature representation through contextual and semantic integration
Abstract
Weakly Supervised Semantic Segmentation (WSSS), which relies only on image-level labels, has attracted significant attention for its cost-effectiveness and scalability. Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations. However, they often neglect the complex contextual dependencies among image patches, resulting in incomplete local representations and limited segmentation accuracy. To address these issues, we propose the Context Patch Fusion with Class Token Enhancement (CPF-CTE) framework, which exploits contextual relations among patches to enrich feature representations and improve segmentation. At its core, the Contextual-Fusion Bidirectional Long Short-Term Memory (CF-BiLSTM) module captures spatial dependencies between patches and enables bidirectional information flow, yielding a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
