A Comparative Study of Specialized LLMs as Dense Retrievers

Hengran Zhang; Keping Bi; Jiafeng Guo

arXiv:2507.03958·cs.IR·August 7, 2025

A Comparative Study of Specialized LLMs as Dense Retrievers

Hengran Zhang, Keping Bi, Jiafeng Guo

PDF

Open Access

TL;DR

This study systematically evaluates how domain-specific adaptations in large language models affect their effectiveness as dense retrievers across various modalities and benchmarks.

Contribution

It provides a comprehensive analysis of the impact of specialization on LLM retrieval performance, highlighting the benefits of vision-language and code-specialized models.

Findings

01

Vision-language and code-specialized LLMs outperform others in zero-shot retrieval.

02

Mathematical and long reasoning specializations degrade retrieval performance.

03

Some specialized models surpass traditional methods like BM25 in code retrieval.

Abstract

While large language models (LLMs) are increasingly deployed as dense retrievers, the impact of their domain-specific specialization on retrieval effectiveness remains underexplored. This investigation systematically examines how task-specific adaptations in LLMs influence their retrieval capabilities, an essential step toward developing unified retrievers capable of handling text, code, images, and multimodal content. We conduct extensive experiments with eight Qwen2.5 7B LLMs, including base, instruction-tuned, code/math-specialized, long reasoning, and vision-language models across zero-shot retrieval settings and the supervised setting. For the zero-shot retrieval settings, we consider text retrieval from the BEIR benchmark and code retrieval from the CoIR benchmark. Further, to evaluate supervised performance, all LLMs are fine-tuned on the MS MARCO dataset. We find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Library Science and Information Systems