A Comparative Study of Specialized LLMs as Dense Retrievers
Hengran Zhang, Keping Bi, Jiafeng Guo

TL;DR
This study systematically evaluates how domain-specific adaptations in large language models affect their effectiveness as dense retrievers across various modalities and benchmarks.
Contribution
It provides a comprehensive analysis of the impact of specialization on LLM retrieval performance, highlighting the benefits of vision-language and code-specialized models.
Findings
Vision-language and code-specialized LLMs outperform others in zero-shot retrieval.
Mathematical and long reasoning specializations degrade retrieval performance.
Some specialized models surpass traditional methods like BM25 in code retrieval.
Abstract
While large language models (LLMs) are increasingly deployed as dense retrievers, the impact of their domain-specific specialization on retrieval effectiveness remains underexplored. This investigation systematically examines how task-specific adaptations in LLMs influence their retrieval capabilities, an essential step toward developing unified retrievers capable of handling text, code, images, and multimodal content. We conduct extensive experiments with eight Qwen2.5 7B LLMs, including base, instruction-tuned, code/math-specialized, long reasoning, and vision-language models across zero-shot retrieval settings and the supervised setting. For the zero-shot retrieval settings, we consider text retrieval from the BEIR benchmark and code retrieval from the CoIR benchmark. Further, to evaluate supervised performance, all LLMs are fine-tuned on the MS MARCO dataset. We find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Library Science and Information Systems
