Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents
Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren,, Zhumin Chen, Dawei Yin, Zhaochun Ren

TL;DR
This paper explores the use of large language models like ChatGPT and GPT-4 for passage relevance ranking in information retrieval, demonstrating their competitive performance, creating a new evaluation dataset, and distilling their capabilities into smaller models.
Contribution
It investigates LLMs as re-ranking agents, introduces a new benchmark dataset, and proposes a distillation method to create efficient, high-performing smaller models.
Findings
LLMs can outperform supervised models in IR tasks.
A new dataset, NovelEval, tests models on unseen knowledge.
Distilled models achieve superior efficiency and performance.
Abstract
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks, including search engines. However, existing work utilizes the generative ability of LLMs for Information Retrieval (IR) rather than direct passage ranking. The discrepancy between the pre-training objectives of LLMs and the ranking objective poses another challenge. In this paper, we first investigate generative LLMs such as ChatGPT and GPT-4 for relevance ranking in IR. Surprisingly, our experiments reveal that properly instructed LLMs can deliver competitive, even superior results to state-of-the-art supervised methods on popular IR benchmarks. Furthermore, to address concerns about data contamination of LLMs, we collect a new test set called NovelEval, based on the latest knowledge and aiming to verify the model's ability to rank unknown knowledge. Finally, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Absolute Position Encodings · Residual Connection · Softmax
