Learning Cross-Image Object Semantic Relation in Transformer for   Few-Shot Fine-Grained Image Classification

Bo Zhang; Jiakang Yuan; Baopu Li; Tao Chen; Jiayuan Fan; Botian Shi

arXiv:2207.00784·cs.CV·July 5, 2022·1 cites

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification

Bo Zhang, Jiakang Yuan, Baopu Li, Tao Chen, Jiayuan Fan, Botian Shi

PDF

Open Access 1 Repo

TL;DR

This paper introduces HelixFormer, a Transformer-based model that effectively mines cross-image semantic relations to improve few-shot fine-grained image classification, outperforming existing methods on multiple benchmarks.

Contribution

The paper proposes a novel double-helix Transformer architecture with relation mining and representation enhancement processes for better cross-image semantic relation modeling in few-shot learning.

Findings

01

HelixFormer achieves superior accuracy on five fine-grained benchmarks.

02

The model outperforms state-of-the-art methods in 1-shot and 5-shot scenarios.

03

Extensive experiments validate the effectiveness of cross-image relation mining.

Abstract

Few-shot fine-grained learning aims to classify a query image into one of a set of support categories with fine-grained differences. Although learning different objects' local differences via Deep Neural Networks has achieved success, how to exploit the query-support cross-image object semantic relations in Transformer-based architecture remains under-explored in the few-shot fine-grained scenario. In this work, we propose a Transformer-based double-helix model, namely HelixFormer, to achieve the cross-image object semantic relation mining in a bidirectional and symmetrical manner. The HelixFormer consists of two steps: 1) Relation Mining Process (RMP) across different branches, and 2) Representation Enhancement Process (REP) within each individual branch. By the designed RMP, each branch can extract fine-grained object-level Cross-image Semantic Relation Maps (CSRMs) using information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiakangyuan/helixformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications