Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks
Yimu Wang, Xiangru Jian, Bo Xue

TL;DR
This paper introduces a novel post-processing framework called Dual Bank Normalization (DBNorm) to mitigate hubness in cross-modal retrieval, significantly improving retrieval accuracy across multiple modalities by leveraging query and gallery data.
Contribution
The paper proposes a new framework, DBNorm, that uses both query and gallery banks to reduce hubness, along with two novel similarity normalization methods, advancing cross-modal retrieval performance.
Findings
DBNorm effectively reduces hubness in cross-modal retrieval.
Proposed methods outperform previous approaches on diverse benchmarks.
Significant improvements in retrieval accuracy across text, image, video, and audio modalities.
Abstract
In this work, we present a post-processing solution to address the hubness problem in cross-modal retrieval, a phenomenon where a small number of gallery data points are frequently retrieved, resulting in a decline in retrieval performance. We first theoretically demonstrate the necessity of incorporating both the gallery and query data for addressing hubness as hubs always exhibit high similarity with gallery and query data. Second, building on our theoretical results, we propose a novel framework, Dual Bank Normalization (DBNorm). While previous work has attempted to alleviate hubness by only utilizing the query samples, DBNorm leverages two banks constructed from the query and gallery samples to reduce the occurrence of hubs during inference. Next, to complement DBNorm, we introduce two novel methods, dual inverted softmax and dual dynamic inverted softmax, for normalizing similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsSoftmax
