TL;DR
REALM introduces an uncertainty-aware, recursive Bayesian re-ranking framework for LLMs that improves ranking accuracy and efficiency by modeling relevance as Gaussian distributions, reducing token usage and latency.
Contribution
The paper presents a novel uncertainty-aware re-ranking method, REALM, that models relevance probabilistically and refines rankings recursively, outperforming existing approaches in efficiency and accuracy.
Findings
Surpasses state-of-the-art re-rankers in NDCG@10
Reduces token usage by up to 84.4%
Improves ranking performance significantly
Abstract
Large Language Models (LLMs) have shown strong capabilities in document re-ranking, a key component in modern Information Retrieval (IR) systems. However, existing LLM-based approaches face notable limitations, including ranking uncertainty, unstable top-k recovery, and high token cost due to token-intensive prompting. To effectively address these limitations, we propose REALM, an uncertainty-aware re-ranking framework that models LLM-derived relevance as Gaussian distributions and refines them through recursive Bayesian updates. By explicitly capturing uncertainty and minimizing redundant queries, REALM achieves better rankings more efficiently. Experimental results demonstrate that our REALM surpasses state-of-the-art re-rankers while significantly reducing token usage and latency, improving NDCG@10 by 0.7-11.9 and simultaneously reducing the number of LLM inferences by 23.4-84.4%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
