MetaFormer: A Unified Meta Framework for Fine-Grained Recognition
Qishuai Diao, Yi Jiang, Bin Wen, Jia Sun, Zehuan Yuan

TL;DR
MetaFormer is a unified framework that effectively integrates various meta-information with visual data to significantly improve fine-grained visual classification accuracy across multiple datasets.
Contribution
It introduces a simple yet powerful meta-framework that jointly learns from vision and diverse meta-information, setting new state-of-the-art results in fine-grained recognition.
Findings
MetaFormer outperforms current SOTA methods using only vision data.
Adding meta-information further boosts performance beyond existing SOTA.
Achieves high accuracy on multiple fine-grained datasets, including iNaturalist, CUB-200-2011, and NABirds.
Abstract
Fine-Grained Visual Classification(FGVC) is the task that requires recognizing the objects belonging to multiple subordinate categories of a super-category. Recent state-of-the-art methods usually design sophisticated learning pipelines to tackle this task. However, visual information alone is often not sufficient to accurately differentiate between fine-grained visual categories. Nowadays, the meta-information (e.g., spatio-temporal prior, attribute, and text description) usually appears along with the images. This inspires us to ask the question: Is it possible to use a unified and simple framework to utilize various meta-information to assist in fine-grained identification? To answer this problem, we explore a unified and strong meta-framework(MetaFormer) for fine-grained visual classification. In practice, MetaFormer provides a simple yet effective approach to address the joint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
MethodsMetaFormer
