Learning Robust Correlation with Foundation Model for Weakly-Supervised   Few-Shot Segmentation

Xinyang Huang; Chuang Zhu; Kebin Liu; Ruiying Ren; Shengjie Liu

arXiv:2405.19638·cs.CV·May 31, 2024

Learning Robust Correlation with Foundation Model for Weakly-Supervised Few-Shot Segmentation

Xinyang Huang, Chuang Zhu, Kebin Liu, Ruiying Ren, Shengjie Liu

PDF

TL;DR

This paper introduces CORENet, a novel weakly-supervised few-shot segmentation model that leverages foundation models and multi-information guidance to learn robust correlations from image-level labels, reducing reliance on pixel masks.

Contribution

The work proposes a Correlation Enhancement Network (CORENet) that integrates a correlation-guided transformer, class-guided, and embedding-guided modules to improve weakly-supervised segmentation performance.

Findings

01

CORENet outperforms existing methods on PASCAL-5i and COCO-20i datasets.

02

The model effectively learns robust correlations from weak supervision.

03

Extensive experiments demonstrate the superiority of CORENet.

Abstract

Existing few-shot segmentation (FSS) only considers learning support-query correlation and segmenting unseen categories under the precise pixel masks. However, the cost of a large number of pixel masks during training is expensive. This paper considers a more challenging scenario, weakly-supervised few-shot segmentation (WS-FSS), which only provides category ( $i . e .$ image-level) labels. It requires the model to learn robust support-query information when the generated mask is inaccurate. In this work, we design a Correlation Enhancement Network (CORENet) with foundation model, which utilizes multi-information guidance to learn robust correlation. Specifically, correlation-guided transformer (CGT) utilizes self-supervised ViT tokens to learn robust correlation from both local and global perspectives. From the perspective of semantic categories, the class-guided module (CGM) guides the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training