Learning Robust Correlation with Foundation Model for Weakly-Supervised Few-Shot Segmentation
Xinyang Huang, Chuang Zhu, Kebin Liu, Ruiying Ren, Shengjie Liu

TL;DR
This paper introduces CORENet, a novel weakly-supervised few-shot segmentation model that leverages foundation models and multi-information guidance to learn robust correlations from image-level labels, reducing reliance on pixel masks.
Contribution
The work proposes a Correlation Enhancement Network (CORENet) that integrates a correlation-guided transformer, class-guided, and embedding-guided modules to improve weakly-supervised segmentation performance.
Findings
CORENet outperforms existing methods on PASCAL-5i and COCO-20i datasets.
The model effectively learns robust correlations from weak supervision.
Extensive experiments demonstrate the superiority of CORENet.
Abstract
Existing few-shot segmentation (FSS) only considers learning support-query correlation and segmenting unseen categories under the precise pixel masks. However, the cost of a large number of pixel masks during training is expensive. This paper considers a more challenging scenario, weakly-supervised few-shot segmentation (WS-FSS), which only provides category ( image-level) labels. It requires the model to learn robust support-query information when the generated mask is inaccurate. In this work, we design a Correlation Enhancement Network (CORENet) with foundation model, which utilizes multi-information guidance to learn robust correlation. Specifically, correlation-guided transformer (CGT) utilizes self-supervised ViT tokens to learn robust correlation from both local and global perspectives. From the perspective of semantic categories, the class-guided module (CGM) guides the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Language-Image Pre-training
