Open-source Large Language Models are Strong Zero-shot Query Likelihood   Models for Document Ranking

Shengyao Zhuang; Bing Liu; Bevan Koopman; Guido Zuccon

arXiv:2310.13243·cs.IR·October 23, 2023·2 cites

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking

Shengyao Zhuang, Bing Liu, Bevan Koopman, Guido Zuccon

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that large language models pre-trained on unstructured text are highly effective zero-shot query likelihood models for document ranking, and introduces a new hybrid ranking system that outperforms existing methods.

Contribution

It provides empirical evidence of the zero-shot ranking capabilities of LLMs and proposes a novel hybrid system combining LLM-based QLMs with a zero-shot retriever.

Findings

01

LLMs show strong zero-shot ranking performance.

02

Instruction fine-tuning may reduce effectiveness unless related to question generation.

03

The proposed hybrid system achieves state-of-the-art results in zero-shot and few-shot settings.

Abstract

In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document. Recently, advanced large language models (LLMs) have emerged as effective QLMs, showcasing promising ranking capabilities. This paper focuses on investigating the genuine zero-shot ranking effectiveness of recent LLMs, which are solely pre-trained on unstructured text data without supervised instruction fine-tuning. Our findings reveal the robust zero-shot ranking ability of such LLMs, highlighting that additional instruction fine-tuning may hinder effectiveness unless a question generation task is present in the fine-tuning dataset. Furthermore, we introduce a novel state-of-the-art ranking system that integrates LLM-based QLMs with a hybrid zero-shot retriever, demonstrating exceptional effectiveness in both zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ielab/llm-qlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies