Task-Adaptive Embedding Refinement via Test-time LLM Guidance

Ariel Gera; Shir Ashury-Tahan; Gal Bloch; Ohad Eytan; Assaf Toledo

arXiv:2605.12487·cs.CL·May 13, 2026

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

Ariel Gera, Shir Ashury-Tahan, Gal Bloch, Ohad Eytan, Assaf Toledo

PDF

1 Repo

TL;DR

This paper introduces a test-time LLM-guided query refinement method that enhances embedding models for zero-shot search and classification, significantly improving their effectiveness in challenging tasks.

Contribution

It presents a novel approach that refines query embeddings in real time using LLM feedback, expanding the practical deployment of embedding models in complex scenarios.

Findings

01

Up to +25% improvement in search and classification tasks

02

Refined queries lead to better ranking and clearer separation in embedding space

03

Method enhances embedding model utility without costly LLM pipelines

Abstract

We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling embeddings to adapt in real time to the target task. We conduct extensive experiments with state-of-the-art text embedding models across a diverse set of challenging search and classification benchmarks. Empirical results indicate that LLM-guided query refinement yields consistent gains across all models and datasets, with relative improvements of up to +25% in literature search, intent detection, key-point matching, and nuanced query-instruction following. The refined queries improve ranking quality and induce clearer binary separation across the corpus, enabling the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/task-aware-embedding-refinement
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.