Selective, Regularized, and Calibrated: Harnessing Vision Foundation Models for Cross-Domain Few-Shot Semantic Segmentation
Junyuan Ma, Xunzhi Xiang, Wenbin Li, Qi Fan, Yang Gao

TL;DR
This paper introduces HERA, a three-stage framework that effectively adapts vision foundation models for cross-domain few-shot semantic segmentation, addressing overfitting and domain shift issues.
Contribution
HERA's hierarchical select-regularize-calibrate pipeline enables efficient adaptation of frozen VFMs to new domains with minimal parameter updates, surpassing state-of-the-art results.
Findings
HERA outperforms existing methods by over 4.1 mIoU on multiple benchmarks.
The framework adapts VFM features with less than 2.7% of parameters fine-tuned.
Hierarchical layer selection improves the effectiveness of domain adaptation.
Abstract
Vision foundation models (VFMs) have achieved strong performance across various vision tasks. However, it still remains challenging to apply VFMs for cross-domain few-shot segmentation (CD-FSS), which segments objects of novel classes under domain shifts using only a few labeled exemplars. The challenge is mainly driven by two factors: (1) limited labeled exemplars per novel class relative to the scale of VFM pre-training, making the model prone to overfitting during retraining, and (2) target-domain shifts underrepresented during pre-training, inducing cross-domain inconsistency and layer-wise sensitivity. To address these issues, we propose Hierarchical Exemplar Representation Adaptation (HERA), a three-stage select-regularize-calibrate VFM-based segmentation framework that learns effectively from limited labels and adapts to novel domains without source-data retraining. We first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
