Hierarchical Matching and Reasoning for Multi-Query Image Retrieval
Zhong Ji, Zhihao Li, Yan Zhang, Haoran Wang, Yanwei Pang, Xuelong Li

TL;DR
This paper introduces a Hierarchical Matching and Reasoning Network (HMRN) for Multi-Query Image Retrieval that captures multi-level similarities and high-level correlations, significantly improving retrieval accuracy over existing methods.
Contribution
The paper proposes a novel HMRN model that disentangles MQIR into hierarchical semantic representations and combines scalar matching with vector reasoning modules for enhanced performance.
Findings
HMRN outperforms state-of-the-art methods on benchmark datasets.
The R@1 metric improves by 23.4% over the best existing method.
The model effectively captures fine-grained, global, and high-level semantic correlations.
Abstract
As a promising field, Multi-Query Image Retrieval (MQIR) aims at searching for the semantically relevant image given multiple region-specific text queries. Existing works mainly focus on a single-level similarity between image regions and text queries, which neglects the hierarchical guidance of multi-level similarities and results in incomplete alignments. Besides, the high-level semantic correlations that intrinsically connect different region-query pairs are rarely considered. To address above limitations, we propose a novel Hierarchical Matching and Reasoning Network (HMRN) for MQIR. It disentangles MQIR into three hierarchical semantic representations, which is responsible to capture fine-grained local details, contextual global scopes, and high-level inherent correlations. HMRN comprises two modules: Scalar-based Matching (SM) module and Vector-based Reasoning (VR) module.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
MethodsFocus
