DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines
Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li

TL;DR
This paper introduces DVF, a novel fine-grained image retrieval model guided by practical principles emphasizing object focus, subcategory discrepancies, and effective training, achieving state-of-the-art results.
Contribution
It proposes a set of practical guidelines for FGIR, including a dual visual filtering mechanism and a discriminative training strategy, to enhance model discriminability and generalization.
Findings
DVF achieves state-of-the-art performance on three FGIR datasets.
The dual visual filtering mechanism effectively captures subcategory-specific discrepancies.
The proposed guidelines improve discriminability and generalization in FGIR models.
Abstract
Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models. These guidelines include emphasizing the object (G1), highlighting subcategory-specific discrepancies (G2), and employing effective training strategy (G3). Following G1 and G2, we design a novel Dual Visual Filtering mechanism for the plain visual transformer, denoted as DVF, to capture subcategory-specific discrepancies. Specifically, the dual visual filtering mechanism comprises an object-oriented module and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
