Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval
Zhengxin Pan, Haishuai Wang, Fangyu Wu, Peng Zhang, Jiajun Bu

TL;DR
This paper introduces Dual Bank Sinkhorn Normalization, a novel method to reduce hubness in cross-modal retrieval by balancing query and target probabilities, leading to improved retrieval accuracy across multiple modalities.
Contribution
It proposes a dual bank normalization framework that extends Sinkhorn Normalization to better handle distributional gaps in cross-modal retrieval tasks.
Findings
Consistent performance improvements across image-text, video-text, and audio-text retrieval.
Effective hubness reduction demonstrated through comprehensive evaluations.
Enhanced probability balancing improves retrieval precision.
Abstract
The past decade has witnessed rapid advancements in cross-modal retrieval, with significant progress made in accurately measuring the similarity between cross-modal pairs. However, the persistent hubness problem, a phenomenon where a small number of targets frequently appear as nearest neighbors to numerous queries, continues to hinder the precision of similarity measurements. Despite several proposed methods to reduce hubness, their underlying mechanisms remain poorly understood. To bridge this gap, we analyze the widely-adopted Inverted Softmax approach and demonstrate its effectiveness in balancing target probabilities during retrieval. Building on these insights, we propose a probability-balancing framework for more effective hubness reduction. We contend that balancing target probabilities alone is inadequate and, therefore, extend the framework to balance both query and target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
