Through the Looking Glass: A Dual Perspective on Weakly-Supervised Few-Shot Segmentation
Jiaqi Ma, Guo-Sen Xie, Fang Zhao, and Zechao Li

TL;DR
This paper introduces a dual perspective approach with heterogeneous networks and multimodal information to improve weakly-supervised few-shot segmentation, achieving state-of-the-art results with fewer parameters.
Contribution
It proposes a novel heterogeneous network architecture with transfer and multimodal modules, outperforming existing models in weakly-supervised segmentation tasks.
Findings
13.2% improvement on Pascal-5i
9.7% improvement on COCO-20i
Outperforms fully supervised models with fewer parameters
Abstract
Meta-learning aims to uniformly sample homogeneous support-query pairs, characterized by the same categories and similar attributes, and extract useful inductive biases through identical network architectures. However, this identical network design results in over-semantic homogenization. To address this, we propose a novel homologous but heterogeneous network. By treating support-query pairs as dual perspectives, we introduce heterogeneous visual aggregation (HA) modules to enhance complementarity while preserving semantic commonality. To further reduce semantic noise and amplify the uniqueness of heterogeneous semantics, we design a heterogeneous transfer (HT) module. Finally, we propose heterogeneous CLIP (HC) textual information to enhance the generalization capability of multimodal models. In the weakly-supervised few-shot semantic segmentation (WFSS) task, with only 1/24 of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
