Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup, Sil, Radu Florian, Md Arafat Sultan, Salim Roukos, Matei Zaharia, Christopher, Potts

TL;DR
This paper advocates for more comprehensive IR benchmarking that includes efficiency metrics like latency and hardware costs alongside accuracy, to better reflect real-world deployment considerations.
Contribution
It introduces a framework for IR benchmarks to incorporate efficiency metrics, highlighting their impact on system choice and encouraging more holistic evaluation methods.
Findings
Efficiency considerations significantly influence IR system selection.
Current benchmarks focus mainly on accuracy, neglecting deployment costs.
Including efficiency metrics leads to different optimal IR system choices.
Abstract
Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Ferroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing
