Selecting which Dense Retriever to use for Zero-Shot Search
Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang,, Guido Zuccon

TL;DR
This paper introduces the problem of selecting the most effective dense retrieval model for zero-shot search scenarios, highlighting the challenges and evaluating existing unsupervised methods, which are found to be ineffective.
Contribution
It formalizes the zero-shot model selection problem for dense retrieval and empirically evaluates existing unsupervised methods, demonstrating their limitations in this context.
Findings
Existing unsupervised evaluation methods are ineffective for zero-shot dense retrieval model selection.
Model effectiveness varies significantly across datasets, even without training data.
The paper highlights the need for new methods to reliably select dense retrievers without labels.
Abstract
We propose the new problem of choosing which dense retrieval model to use when searching on a new collection for which no labels are available, i.e. in a zero-shot setting. Many dense retrieval models are readily available. Each model however is characterized by very differing search effectiveness -- not just on the test portion of the datasets in which the dense representations have been learned but, importantly, also across different datasets for which data was not used to learn the dense representations. This is because dense retrievers typically require training on a large amount of labeled data to achieve satisfactory search effectiveness in a specific dataset or domain. Moreover, effectiveness gains obtained by dense retrievers on datasets for which they are able to observe labels during training, do not necessarily generalise to datasets that have not been observed during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Advanced Image and Video Retrieval Techniques
