Probabilistic Prototype Calibration of Vision-Language Models for Generalized Few-shot Semantic Segmentation

Jie Liu; Jiayi Shen; Pan Zhou; Jan-Jakob Sonke; Efstratios Gavves

arXiv:2506.22979·cs.CV·July 1, 2025

Probabilistic Prototype Calibration of Vision-Language Models for Generalized Few-shot Semantic Segmentation

Jie Liu, Jiayi Shen, Pan Zhou, Jan-Jakob Sonke, Efstratios Gavves

PDF

Open Access

TL;DR

FewCLIP introduces a probabilistic calibration framework for vision-language models that enhances adaptability and generalization in generalized few-shot semantic segmentation, outperforming existing methods on standard benchmarks.

Contribution

It proposes a novel probabilistic prototype calibration method for GFSS using CLIP, improving adaptability and reducing overfitting in few-shot scenarios.

Findings

01

Outperforms state-of-the-art on PASCAL-5i and COCO-20i datasets.

02

Provides uncertainty-aware prototype learning for better generalization.

03

Demonstrates effectiveness in class-incremental settings.

Abstract

Generalized Few-Shot Semantic Segmentation (GFSS) aims to extend a segmentation model to novel classes with only a few annotated examples while maintaining performance on base classes. Recently, pretrained vision-language models (VLMs) such as CLIP have been leveraged in GFSS to improve generalization on novel classes through multi-modal prototypes learning. However, existing prototype-based methods are inherently deterministic, limiting the adaptability of learned prototypes to diverse samples, particularly for novel classes with scarce annotations. To address this, we propose FewCLIP, a probabilistic prototype calibration framework over multi-modal prototypes from the pretrained CLIP, thus providing more adaptive prototype learning for GFSS. Specifically, FewCLIP first introduces a prototype calibration mechanism, which refines frozen textual prototypes with learnable visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications