OBLIQ-Bench: Exposing Overlooked Bottlenecks in Modern Retrievers with Latent and Implicit Queries

Diane Tchuindjo; Devavrat Shah; Omar Khattab

arXiv:2605.06235·cs.IR·May 8, 2026

OBLIQ-Bench: Exposing Overlooked Bottlenecks in Modern Retrievers with Latent and Implicit Queries

Diane Tchuindjo, Devavrat Shah, Omar Khattab

PDF

1 Datasets

TL;DR

OBLIQ-Bench introduces a new benchmark to evaluate retrieval systems on oblique queries that seek documents with latent, implicit relevance patterns, revealing gaps in current retrieval methods.

Contribution

The paper presents OBLIQ-Bench, a suite of five oblique search problems, highlighting an overlooked asymmetry between retrieval and verification in large language models.

Findings

01

Retrieval pipelines often fail to surface relevant documents for oblique queries.

02

LLMs reliably recognize latent relevance when relevant documents are surfaced.

03

OBLIQ-Bench exposes limitations in current retrieval architectures for implicit signals.

Abstract

Retrieval benchmarks are increasingly saturating, but we argue that efficient search is far from a solved problem. We identify a class of queries we call oblique, which seek documents that instantiate a latent pattern, like finding all tweets that express an implicit stance, chat logs that demonstrate a particular failure mode, or transcripts that match an abstract scenario. We study three mechanisms through which obliqueness may arise and introduce OBLIQ-Bench, a suite of five oblique search problems over real long-tail corpora. OBLIQ-Bench exposes an overlooked asymmetry between retrieval and verification, where reasoning LLMs reliably recognize latent relevance whenever relevant documents are surfaced, but even sophisticated retrieval pipelines fail to surface most relevant documents in the first place. We hope that OBLIQ-Bench will drive research into retrieval architectures that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

dianetc/OBLIQ-Bench
dataset· 859 dl
859 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.