Drowning in Documents: Consequences of Scaling Reranker Inference
Mathew Jacob, Erik Lindgren, Matei Zaharia, Michael Carbin, Omar Khattab, Andrew Drozdov

TL;DR
This paper investigates the actual effectiveness of rerankers in full retrieval tasks, revealing that their performance diminishes as they score more documents, challenging common assumptions about their superiority.
Contribution
It provides a comprehensive evaluation of rerankers in realistic retrieval scenarios, highlighting their limitations and encouraging future improvements.
Findings
Rerankers improve initial retrieval but decline with more documents
Performance degradation occurs beyond a certain number of scored documents
Strong first-stage retrieval is crucial for reranker effectiveness
Abstract
Rerankers, typically cross-encoders, are computationally intensive but are frequently used because they are widely assumed to outperform cheaper initial IR systems. We challenge this assumption by measuring reranker performance for full retrieval, not just re-scoring first-stage retrieval. To provide a more robust evaluation, we prioritize strong first-stage retrieval using modern dense embeddings and test rerankers on a variety of carefully chosen, challenging tasks, including internally curated datasets to avoid contamination, and out-of-domain ones. Our empirical results reveal a surprising trend: the best existing rerankers provide initial improvements when scoring progressively more documents, but their effectiveness gradually declines and can even degrade quality beyond a certain limit. We hope that our findings will spur future research to improve reranking.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, logistics, and international trade · Safety Warnings and Signage · Infrastructure Maintenance and Monitoring
