Progressive Multi-stage Interactive Training in Mobile Network for Fine-grained Recognition
Zhenxin Wu, Qingliang Chen, Yifeng Liu, Yinqi Zhang, Chengkai Zhu,, Yang Yu

TL;DR
This paper introduces a lightweight training method for mobile networks that enhances fine-grained recognition by progressively integrating multi-stage features through recursive image mosaics, achieving improved accuracy and robustness.
Contribution
It proposes RMG-PMSI, a novel training approach combining recursive mosaics and multi-stage interaction to boost mobile network performance in fine-grained classification.
Findings
Significant accuracy improvements on benchmark datasets.
Enhanced robustness and transferability of the model.
Effective utilization of multi-stage features in lightweight networks.
Abstract
Fine-grained Visual Classification (FGVC) aims to identify objects from subcategories. It is a very challenging task because of the subtle inter-class differences. Existing research applies large-scale convolutional neural networks or visual transformers as the feature extractor, which is extremely computationally expensive. In fact, real-world scenarios of fine-grained recognition often require a more lightweight mobile network that can be utilized offline. However, the fundamental mobile network feature extraction capability is weaker than large-scale models. In this paper, based on the lightweight MobilenetV2, we propose a Progressive Multi-Stage Interactive training method with a Recursive Mosaic Generator (RMG-PMSI). First, we propose a Recursive Mosaic Generator (RMG) that generates images with different granularities in different phases. Then, the features of different stages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
