DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with   Retrieval Guidelines

Xin Jiang; Hao Tang; Rui Yan; Jinhui Tang; Zechao Li

arXiv:2404.15771·cs.CV·April 25, 2024·1 cites

DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines

Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li

PDF

Open Access

TL;DR

This paper introduces DVF, a novel fine-grained image retrieval model guided by practical principles emphasizing object focus, subcategory discrepancies, and effective training, achieving state-of-the-art results.

Contribution

It proposes a set of practical guidelines for FGIR, including a dual visual filtering mechanism and a discriminative training strategy, to enhance model discriminability and generalization.

Findings

01

DVF achieves state-of-the-art performance on three FGIR datasets.

02

The dual visual filtering mechanism effectively captures subcategory-specific discrepancies.

03

The proposed guidelines improve discriminability and generalization in FGIR models.

Abstract

Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models. These guidelines include emphasizing the object (G1), highlighting subcategory-specific discrepancies (G2), and employing effective training strategy (G3). Following G1 and G2, we design a novel Dual Visual Filtering mechanism for the plain visual transformer, denoted as DVF, to capture subcategory-specific discrepancies. Specifically, the dual visual filtering mechanism comprises an object-oriented module and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications