ViT-ProtoNet for Few-Shot Image Classification: A Multi-Benchmark Evaluation

Abdulvahap Mutlu; \c{S}eng\"ul Do\u{g}an; T\"urker Tuncer

arXiv:2507.09299·cs.CV·July 15, 2025

ViT-ProtoNet for Few-Shot Image Classification: A Multi-Benchmark Evaluation

Abdulvahap Mutlu, \c{S}eng\"ul Do\u{g}an, T\"urker Tuncer

PDF

Open Access 1 Repo

TL;DR

This paper introduces ViT-ProtoNet, a novel few-shot image classification method combining Vision Transformers with Prototypical Networks, demonstrating superior performance across multiple benchmarks and establishing a new baseline for transformer-based meta-learning.

Contribution

The paper presents ViT-ProtoNet, integrating ViT-Small into Prototypical Networks, and provides extensive empirical evaluation showing its effectiveness and robustness in few-shot classification.

Findings

01

Outperforms CNN-based methods by up to 3.2% in 5-shot accuracy.

02

Achieves superior feature separability in latent space.

03

Outperforms or matches transformer-based competitors with a lightweight backbone.

Abstract

The remarkable representational power of Vision Transformers (ViTs) remains underutilized in few-shot image classification. In this work, we introduce ViT-ProtoNet, which integrates a ViT-Small backbone into the Prototypical Network framework. By averaging class conditional token embeddings from a handful of support examples, ViT-ProtoNet constructs robust prototypes that generalize to novel categories under 5-shot settings. We conduct an extensive empirical evaluation on four standard benchmarks: Mini-ImageNet, FC100, CUB-200, and CIFAR-FS, including overlapped support variants to assess robustness. Across all splits, ViT-ProtoNet consistently outperforms CNN-based prototypical counterparts, achieving up to a 3.2\% improvement in 5-shot accuracy and demonstrating superior feature separability in latent space. Furthermore, it outperforms or is competitive with transformer-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abdulvahapmutlu/vit-protonet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Image Processing Techniques and Applications · Advanced Image and Video Retrieval Techniques

MethodsSparse Evolutionary Training