Revealing the Semantic Selection Gap in DINOv3 through Training-Free Few-Shot Segmentation

Hussni Mohd Zakir; Eric Tatt Wei Ho

arXiv:2602.07550·cs.CV·February 10, 2026

Revealing the Semantic Selection Gap in DINOv3 through Training-Free Few-Shot Segmentation

Hussni Mohd Zakir, Eric Tatt Wei Ho

PDF

Open Access

TL;DR

This paper investigates the intrinsic few-shot semantic segmentation capabilities of frozen DINOv3 features using a training-free approach, revealing a significant semantic selection gap and establishing the last-layer features as a strong baseline.

Contribution

It introduces FSSDINO, a training-free method leveraging DINOv3 features for few-shot segmentation, and uncovers the semantic selection gap in foundation models through layer analysis.

Findings

01

FSSDINO achieves competitive results without training.

02

A performance gap exists between last-layer and intermediate features.

03

Traditional heuristics fail to identify high-fidelity features reliably.

Abstract

Recent self-supervised Vision Transformers (ViTs), such as DINOv3, provide rich feature representations for dense vision tasks. This study investigates the intrinsic few-shot semantic segmentation (FSS) capabilities of frozen DINOv3 features through a training-free baseline, FSSDINO, utilizing class-specific prototypes and Gram-matrix refinement. Our results across binary, multi-class, and cross-domain (CDFSS) benchmarks demonstrate that this minimal approach, applied to the final backbone layer, is highly competitive with specialized methods involving complex decoders or test-time adaptation. Crucially, we conduct an Oracle-guided layer analysis, identifying a significant performance gap between the standard last-layer features and globally optimal intermediate representations. We reveal a "Safest vs. Optimal" dilemma: while the Oracle proves higher performance is attainable, matching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications