Are LLM-Based Retrievers Worth Their Cost? An Empirical Study of Efficiency, Robustness, and Reasoning Overhead

Abdelrahman Abdallah; Jamie Holdcroft; Mohammed Ali; Adam Jatowt

arXiv:2604.03676·cs.IR·April 7, 2026

Are LLM-Based Retrievers Worth Their Cost? An Empirical Study of Efficiency, Robustness, and Reasoning Overhead

Abdelrahman Abdallah, Jamie Holdcroft, Mohammed Ali, Adam Jatowt

PDF

1 Repo

TL;DR

This study evaluates the efficiency, robustness, and reasoning overhead of LLM-based retrievers across multiple tasks, highlighting trade-offs between effectiveness and latency, and assessing confidence calibration issues.

Contribution

It provides a comprehensive empirical analysis of various retrievers, extending evaluation metrics, and quantifies reasoning overhead and confidence calibration challenges.

Findings

01

Some reasoning-specialized retrievers achieve high effectiveness with competitive throughput.

02

Large LLM-based bi-encoders often incur high latency with modest gains.

03

Confidence scores are unreliable for downstream decision-making.

Abstract

Large language model retrievers improve performance on complex queries, but their practical value depends on efficiency, robustness, and reliable confidence signals in addition to accuracy. We reproduce a reasoning-intensive retrieval benchmark (BRIGHT) across 12 tasks and 14 retrievers, and extend evaluation with cold-start indexing cost, query latency distributions and throughput, corpus scaling, robustness to controlled query perturbations, and confidence use (AUROC) for predicting query success. We also quantify \emph{reasoning overhead} by comparing standard queries to five provided reasoning-augmented variants, measuring accuracy gains relative to added latency. We find that some reasoning-specialized retrievers achieve strong effectiveness while remaining competitive in throughput, whereas several large LLM-based bi-encoders incur substantial latency for modest gains. Reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

datascienceuibk/LLM-Retrievers-Beyond-Relevance
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.