Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval

Zhengxin Pan; Haishuai Wang; Fangyu Wu; Peng Zhang; Jiajun Bu

arXiv:2508.02538·cs.IR·August 5, 2025

Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval

Zhengxin Pan, Haishuai Wang, Fangyu Wu, Peng Zhang, Jiajun Bu

PDF

TL;DR

This paper introduces Dual Bank Sinkhorn Normalization, a novel method to reduce hubness in cross-modal retrieval by balancing query and target probabilities, leading to improved retrieval accuracy across multiple modalities.

Contribution

It proposes a dual bank normalization framework that extends Sinkhorn Normalization to better handle distributional gaps in cross-modal retrieval tasks.

Findings

01

Consistent performance improvements across image-text, video-text, and audio-text retrieval.

02

Effective hubness reduction demonstrated through comprehensive evaluations.

03

Enhanced probability balancing improves retrieval precision.

Abstract

The past decade has witnessed rapid advancements in cross-modal retrieval, with significant progress made in accurately measuring the similarity between cross-modal pairs. However, the persistent hubness problem, a phenomenon where a small number of targets frequently appear as nearest neighbors to numerous queries, continues to hinder the precision of similarity measurements. Despite several proposed methods to reduce hubness, their underlying mechanisms remain poorly understood. To bridge this gap, we analyze the widely-adopted Inverted Softmax approach and demonstrate its effectiveness in balancing target probabilities during retrieval. Building on these insights, we propose a probability-balancing framework for more effective hubness reduction. We contend that balancing target probabilities alone is inadequate and, therefore, extend the framework to balance both query and target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.