Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts

Bingqing Zhang; Zhuo Cao; Heming Du; Yang Li; Xue Li; Jiajun Liu; Sen Wang

arXiv:2604.20851·cs.IR·April 24, 2026

Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts

Bingqing Zhang, Zhuo Cao, Heming Du, Yang Li, Xue Li, Jiajun Liu, Sen Wang

PDF

1 Video

TL;DR

This paper introduces a new benchmark for evaluating video-text retrieval models under query shifts, analyzes the hubness problem, and proposes HAT-VTR, a test-time adaptation method that significantly improves robustness.

Contribution

The paper presents a comprehensive benchmark for query shifts in video-text retrieval and proposes HAT-VTR, a novel test-time adaptation framework addressing hubness and improving robustness.

Findings

01

HAT-VTR outperforms prior methods across various query shift scenarios.

02

Query shifts significantly increase hubness in video-text retrieval.

03

The benchmark reveals diverse types and severities of video perturbations affecting performance.

Abstract

Modern video-text retrieval (VTR) models excel on in-distribution benchmarks but are highly vulnerable to real-world query shifts, where the distribution of query data deviates from the training domain, leading to a sharp performance drop. Existing image-focused robustness solutions are inadequate to handle this vulnerability in video, as they fail to address the complex spatio-temporal dynamics inherent in these shifts. To systematically evaluate this vulnerability, we first introduce a comprehensive benchmark featuring 12 distinct types of video perturbations across five severity degrees. Analysis on this benchmark reveals that query shifts amplify the hubness phenomenon, where a few gallery items become dominant "hubs" that attract a disproportionate number of queries. To mitigate this, we then propose HAT-VTR (Hubness Alleviation for Test-time Video-Text Retrieval), as our baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Robust Test-time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts· slideslive