MetaFormer: A Unified Meta Framework for Fine-Grained Recognition

Qishuai Diao; Yi Jiang; Bin Wen; Jia Sun; Zehuan Yuan

arXiv:2203.02751·cs.CV·March 8, 2022·27 cites

MetaFormer: A Unified Meta Framework for Fine-Grained Recognition

Qishuai Diao, Yi Jiang, Bin Wen, Jia Sun, Zehuan Yuan

PDF

Open Access 2 Repos

TL;DR

MetaFormer is a unified framework that effectively integrates various meta-information with visual data to significantly improve fine-grained visual classification accuracy across multiple datasets.

Contribution

It introduces a simple yet powerful meta-framework that jointly learns from vision and diverse meta-information, setting new state-of-the-art results in fine-grained recognition.

Findings

01

MetaFormer outperforms current SOTA methods using only vision data.

02

Adding meta-information further boosts performance beyond existing SOTA.

03

Achieves high accuracy on multiple fine-grained datasets, including iNaturalist, CUB-200-2011, and NABirds.

Abstract

Fine-Grained Visual Classification(FGVC) is the task that requires recognizing the objects belonging to multiple subordinate categories of a super-category. Recent state-of-the-art methods usually design sophisticated learning pipelines to tackle this task. However, visual information alone is often not sufficient to accurately differentiate between fine-grained visual categories. Nowadays, the meta-information (e.g., spatio-temporal prior, attribute, and text description) usually appears along with the images. This inspires us to ask the question: Is it possible to use a unified and simple framework to utilize various meta-information to assist in fine-grained identification? To answer this problem, we explore a unified and strong meta-framework(MetaFormer) for fine-grained visual classification. In practice, MetaFormer provides a simple yet effective approach to address the joint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsMetaFormer