Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with Large Language Models
Baharan Nouriinanloo, Maxime Lamothe

TL;DR
This paper explores a pre-filtering approach using human relevance scores and LLMs to improve passage re-ranking in information retrieval, enabling smaller models to perform competitively with larger proprietary models.
Contribution
It introduces a pre-filtering step that enhances re-ranking performance, making smaller models competitive with large proprietary LLMs.
Findings
Pre-filtering improves re-ranking accuracy.
Smaller models like Mixtral can match larger models with pre-filtering.
Pre-filtering reduces reliance on expensive proprietary LLMs.
Abstract
Large Language Models (LLMs) have been revolutionizing a myriad of natural language processing tasks with their diverse zero-shot capabilities. Indeed, existing work has shown that LLMs can be used to great effect for many tasks, such as information retrieval (IR), and passage ranking. However, current state-of-the-art results heavily lean on the capabilities of the LLM being used. Currently, proprietary, and very large LLMs such as GPT-4 are the highest performing passage re-rankers. Hence, users without the resources to leverage top of the line LLMs, or ones that are closed source, are at a disadvantage. In this paper, we investigate the use of a pre-filtering step before passage re-ranking in IR. Our experiments show that by using a small number of human generated relevance scores, coupled with LLM relevance scoring, it is effectively possible to filter out irrelevant passages before…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Spam and Phishing Detection
MethodsAttention Is All You Need · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer
