Is ChatGPT Good at Search? Investigating Large Language Models as   Re-Ranking Agents

Weiwei Sun; Lingyong Yan; Xinyu Ma; Shuaiqiang Wang; Pengjie Ren,; Zhumin Chen; Dawei Yin; Zhaochun Ren

arXiv:2304.09542·cs.CL·December 31, 2024·23 cites

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents

Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren,, Zhumin Chen, Dawei Yin, Zhaochun Ren

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper explores the use of large language models like ChatGPT and GPT-4 for passage relevance ranking in information retrieval, demonstrating their competitive performance, creating a new evaluation dataset, and distilling their capabilities into smaller models.

Contribution

It investigates LLMs as re-ranking agents, introduces a new benchmark dataset, and proposes a distillation method to create efficient, high-performing smaller models.

Findings

01

LLMs can outperform supervised models in IR tasks.

02

A new dataset, NovelEval, tests models on unseen knowledge.

03

Distilled models achieve superior efficiency and performance.

Abstract

Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks, including search engines. However, existing work utilizes the generative ability of LLMs for Information Retrieval (IR) rather than direct passage ranking. The discrepancy between the pre-training objectives of LLMs and the ranking objective poses another challenge. In this paper, we first investigate generative LLMs such as ChatGPT and GPT-4 for relevance ranking in IR. Surprisingly, our experiments reveal that properly instructed LLMs can deliver competitive, even superior results to state-of-the-art supervised methods on popular IR benchmarks. Furthermore, to address concerns about data contamination of LLMs, we collect a new test set called NovelEval, based on the latest knowledge and aiming to verify the model's ability to rank unknown knowledge. Finally, to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sunnweiwei/rankgpt
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Absolute Position Encodings · Residual Connection · Softmax