Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information
Lingfeng Yang, Xiang Li, Renjie Song, Borui Zhao, Juntian Tao, Shihao, Zhou, Jiajun Liang, Jian Yang

TL;DR
This paper introduces a dynamic MLP that leverages geographical and temporal information to improve fine-grained image classification, significantly enhancing discriminative features and achieving state-of-the-art results.
Contribution
It presents the first dynamic network approach that exploits multimodal data at a higher dimension for fine-grained classification, outperforming existing methods.
Findings
Improves image representation discriminability
Achieves state-of-the-art accuracy on multiple datasets
Enhances visual recognizability of similar categories
Abstract
Fine-grained image classification is a challenging computer vision task where various species share similar visual appearances, resulting in misclassification if merely based on visual clues. Therefore, it is helpful to leverage additional information, e.g., the locations and dates for data shooting, which can be easily accessible but rarely exploited. In this paper, we first demonstrate that existing multimodal methods fuse multiple features only on a single dimension, which essentially has insufficient help in feature discrimination. To fully explore the potential of multimodal information, we propose a dynamic MLP on top of the image representation, which interacts with multimodal features at a higher and broader dimension. The dynamic MLP is an efficient structure parameterized by the learned embeddings of variable locations and dates. It can be regarded as an adaptive nonlinear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
