Weakly-supervised Semantic Segmentation via Dual-stream Contrastive Learning of Cross-image Contextual Information
Qi Lai, Chi-Man Vong

TL;DR
This paper introduces DSCNet, a novel weakly-supervised semantic segmentation framework that leverages dual-stream contrastive learning to utilize both pixel-wise and semantic-wise inter-image information, significantly improving performance.
Contribution
The paper proposes a new end-to-end WSSS framework with pixel-wise group contrast and semantic-wise graph contrast, integrating dual-stream contrastive learning for enhanced segmentation accuracy.
Findings
Outperforms state-of-the-art methods on PASCAL VOC and MS COCO datasets.
Demonstrates the effectiveness of dual-stream contrastive learning in WSSS.
Achieves significant performance gains over baseline models.
Abstract
Weakly supervised semantic segmentation (WSSS) aims at learning a semantic segmentation model with only image-level tags. Despite intensive research on deep learning approaches over a decade, there is still a significant performance gap between WSSS and full semantic segmentation. Most current WSSS methods always focus on a limited single image (pixel-wise) information while ignoring the valuable inter-image (semantic-wise) information. From this perspective, a novel end-to-end WSSS framework called DSCNet is developed along with two innovations: i) pixel-wise group contrast and semantic-wise graph contrast are proposed and introduced into the WSSS framework; ii) a novel dual-stream contrastive learning (DSCL) mechanism is designed to jointly handle pixel-wise and semantic-wise context information for better WSSS performance. Specifically, the pixel-wise group contrast learning (PGCL)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Face and Expression Recognition
MethodsContrastive Learning · Focus
