Why Large Language Models can Secretly Outperform Embedding Similarity in Information Retrieval

Matei Benescu; Ivo Pascal de Jong

arXiv:2603.08077·cs.IR·March 10, 2026

Why Large Language Models can Secretly Outperform Embedding Similarity in Information Retrieval

Matei Benescu, Ivo Pascal de Jong

PDF

Open Access

TL;DR

This paper investigates whether Large Language Models can surpass traditional embedding similarity methods in information retrieval by leveraging reasoning, but finds current datasets and annotations limit the evaluation of their true potential.

Contribution

The study demonstrates that LLM-based relevance judgment systems can address short-sightedness in retrieval, but standard datasets do not adequately evaluate this advantage.

Findings

01

LLMs with reasoning can potentially outperform embedding similarity in relevance judgment.

02

Standard datasets and annotations may underestimate LLMs' capabilities due to short-sightedness.

03

False positives often stem from annotation errors, not model limitations.

Abstract

With the emergence of Large Language Models (LLMs), new methods in Information Retrieval are available in which relevance is estimated directly through language understanding and reasoning, instead of embedding similarity. We argue that similarity is a short-sighted interpretation of relevance, and that LLM-Based Relevance Judgment Systems (LLM-RJS) (with reasoning) have potential to outperform Neural Embedding Retrieval Systems (NERS) by overcoming this limitation. Using the TREC-DL 2019 passage retrieval dataset, we compare various LLM-RJS with NERS, but observe no noticeable improvement. Subsequently, we analyze the impact of reasoning by comparing LLM-RJS with and without reasoning. We find that human annotations also suffer from short-sightedness, and that false-positives in the reasoning LLM-RJS are primarily mistakes in annotations due to short-sightedness. We conclude that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Advanced Graph Neural Networks