Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

Rajinder Sandhu; Di Mu; Cheng Chang; Md Shahriar Tasjid; Himanshu Rai; Maksims Volkovs; Ga Wu

arXiv:2604.22722·cs.IR·April 27, 2026

Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

Rajinder Sandhu, Di Mu, Cheng Chang, Md Shahriar Tasjid, Himanshu Rai, Maksims Volkovs, Ga Wu

PDF

TL;DR

This paper introduces Utility-Aligned Embeddings (UAE), a novel retrieval framework that combines the efficiency of dense vector search with the utility of LLM re-ranking, significantly improving performance and speed.

Contribution

UAE formulates retrieval as a distribution matching problem, enabling utility-based signals to be embedded directly into dense representations without test-time LLM inference.

Findings

01

UAE improves Recall@1 by 30.59% on QASPER

02

UAE achieves over 180x faster retrieval than LLM re-ranking methods

03

UAE outperforms strong semantic baselines in key metrics

Abstract

Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.