On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability

Yongkang Li; Panagiotis Eustratiadis; Yixing Fan; Evangelos Kanoulas

arXiv:2604.16576·cs.IR·April 21, 2026

On the Robustness of LLM-Based Dense Retrievers: A Systematic Analysis of Generalizability and Stability

Yongkang Li, Panagiotis Eustratiadis, Yixing Fan, Evangelos Kanoulas

PDF

1 Repo

TL;DR

This paper systematically evaluates the robustness of LLM-based dense retrievers, focusing on their generalizability and stability against various data and adversarial challenges, providing insights for future design.

Contribution

It offers the first comprehensive analysis of LLM-based retriever robustness across multiple benchmarks and attack types, highlighting factors influencing their stability and generalizability.

Findings

01

Instruction-tuned models excel but struggle with broad generalization.

02

LLM retrievers are more robust to typos and poisoning than encoder-only models.

03

Larger models tend to be more robust and stable.

Abstract

Decoder-only large language models (LLMs) are increasingly replacing BERT-style architectures as the backbone for dense retrieval, achieving substantial performance gains and broad adoption. However, the robustness of these LLM-based retrievers remains underexplored. In this paper, we present the first systematic study of the robustness of state-of-the-art open-source LLM-based dense retrievers from two complementary perspectives: generalizability and stability. For generalizability, we evaluate retrieval effectiveness across four benchmarks spanning 30 datasets, using linear mixed-effects models to estimate marginal mean performance and disentangle intrinsic model capability from dataset heterogeneity. Our analysis reveals that while instruction-tuned models generally excel, those optimized for complex reasoning often suffer a ``specialization tax,'' exhibiting limited generalizability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liyongkang123/Robust_LLM_Retriever_Eval
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.