Mitigating Spurious Correlations in Weakly Supervised Semantic Segmentation via Cross-architecture Consistency Regularization
Zheyuan Zhang, Yen-chia Hsu

TL;DR
This paper introduces a novel weakly supervised semantic segmentation framework that uses cross-architecture consistency between CNNs and ViTs to reduce spurious correlations and improve segmentation quality without external priors.
Contribution
The proposed method employs a teacher-student framework with CNNs and ViTs, introducing a knowledge transfer loss for cross-architecture consistency to mitigate inherent model bias.
Findings
Improved segmentation accuracy over baseline methods.
Reduced spurious correlations in weakly supervised segmentation.
Enhanced pseudo mask quality through post-processing techniques.
Abstract
Scarcity of pixel-level labels is a significant challenge in practical scenarios. In specific domains like industrial smoke, acquiring such detailed annotations is particularly difficult and often requires expert knowledge. To alleviate this, weakly supervised semantic segmentation (WSSS) has emerged as a promising approach. However, due to the supervision gap and inherent bias in models trained with only image level labels, existing WSSS methods suffer from limitations such as incomplete foreground coverage, inaccurate object boundaries, and spurious correlations, especially in our domain, where emissions are always spatially coupled with chimneys. Previous solutions typically rely on additional priors or external knowledge to mitigate these issues, but they often lack scalability and fail to address the model's inherent bias toward co-occurring context. To address this, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
