Siamese Transformer Networks for Few-shot Image Classification

Weihao Jiang; Shuoxi Zhang; Kun He

arXiv:2408.01427·cs.CV·August 6, 2024

Siamese Transformer Networks for Few-shot Image Classification

Weihao Jiang, Shuoxi Zhang, Kun He

PDF

Open Access

TL;DR

This paper introduces a Siamese Transformer Network that combines global and local features using pre-trained Vision Transformers for improved few-shot image classification, demonstrating superior results on multiple benchmarks.

Contribution

The paper proposes a novel STN architecture that integrates global and local features with a meta-learning training strategy, enhancing few-shot classification performance.

Findings

01

Achieves superior accuracy on four benchmarks.

02

Effectively combines global and local features.

03

Outperforms state-of-the-art methods in 5-shot and 1-shot scenarios.

Abstract

Humans exhibit remarkable proficiency in visual classification tasks, accurately recognizing and classifying new images with minimal examples. This ability is attributed to their capacity to focus on details and identify common features between previously seen and new images. In contrast, existing few-shot image classification methods often emphasize either global features or local features, with few studies considering the integration of both. To address this limitation, we propose a novel approach based on the Siamese Transformer Network (STN). Our method employs two parallel branch networks utilizing the pre-trained Vision Transformer (ViT) architecture to extract global and local features, respectively. Specifically, we implement the ViT-Small network architecture and initialize the branch networks with pre-trained model parameters obtained through self-supervised learning. We apply…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Spectroscopy Techniques in Biomedical and Chemical Research

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Vision Transformer