Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with   Large Language Models

Baharan Nouriinanloo; Maxime Lamothe

arXiv:2406.18740·cs.CL·June 28, 2024·2 cites

Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with Large Language Models

Baharan Nouriinanloo, Maxime Lamothe

PDF

Open Access

TL;DR

This paper explores a pre-filtering approach using human relevance scores and LLMs to improve passage re-ranking in information retrieval, enabling smaller models to perform competitively with larger proprietary models.

Contribution

It introduces a pre-filtering step that enhances re-ranking performance, making smaller models competitive with large proprietary LLMs.

Findings

01

Pre-filtering improves re-ranking accuracy.

02

Smaller models like Mixtral can match larger models with pre-filtering.

03

Pre-filtering reduces reliance on expensive proprietary LLMs.

Abstract

Large Language Models (LLMs) have been revolutionizing a myriad of natural language processing tasks with their diverse zero-shot capabilities. Indeed, existing work has shown that LLMs can be used to great effect for many tasks, such as information retrieval (IR), and passage ranking. However, current state-of-the-art results heavily lean on the capabilities of the LLM being used. Currently, proprietary, and very large LLMs such as GPT-4 are the highest performing passage re-rankers. Hence, users without the resources to leverage top of the line LLMs, or ones that are closed source, are at a disadvantage. In this paper, we investigate the use of a pre-filtering step before passage re-ranking in IR. Our experiments show that by using a small number of human generated relevance scores, coupled with LLM relevance scoring, it is effectively possible to filter out irrelevant passages before…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Spam and Phishing Detection

MethodsAttention Is All You Need · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer