ViT-ProtoNet for Few-Shot Image Classification: A Multi-Benchmark Evaluation
Abdulvahap Mutlu, \c{S}eng\"ul Do\u{g}an, T\"urker Tuncer

TL;DR
This paper introduces ViT-ProtoNet, a novel few-shot image classification method combining Vision Transformers with Prototypical Networks, demonstrating superior performance across multiple benchmarks and establishing a new baseline for transformer-based meta-learning.
Contribution
The paper presents ViT-ProtoNet, integrating ViT-Small into Prototypical Networks, and provides extensive empirical evaluation showing its effectiveness and robustness in few-shot classification.
Findings
Outperforms CNN-based methods by up to 3.2% in 5-shot accuracy.
Achieves superior feature separability in latent space.
Outperforms or matches transformer-based competitors with a lightweight backbone.
Abstract
The remarkable representational power of Vision Transformers (ViTs) remains underutilized in few-shot image classification. In this work, we introduce ViT-ProtoNet, which integrates a ViT-Small backbone into the Prototypical Network framework. By averaging class conditional token embeddings from a handful of support examples, ViT-ProtoNet constructs robust prototypes that generalize to novel categories under 5-shot settings. We conduct an extensive empirical evaluation on four standard benchmarks: Mini-ImageNet, FC100, CUB-200, and CIFAR-FS, including overlapped support variants to assess robustness. Across all splits, ViT-ProtoNet consistently outperforms CNN-based prototypical counterparts, achieving up to a 3.2\% improvement in 5-shot accuracy and demonstrating superior feature separability in latent space. Furthermore, it outperforms or is competitive with transformer-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Image Processing Techniques and Applications · Advanced Image and Video Retrieval Techniques
MethodsSparse Evolutionary Training
