Query Drift Compensation: Enabling Compatibility in Continual Learning of Retrieval Embedding Models

Dipam Goswami; Liying Wang; Bart{\l}omiej Twardowski; Joost van de Weijer

arXiv:2506.00037·cs.IR·October 7, 2025

Query Drift Compensation: Enabling Compatibility in Continual Learning of Retrieval Embedding Models

Dipam Goswami, Liying Wang, Bart{\l}omiej Twardowski, Joost van de Weijer

PDF

Open Access

TL;DR

This paper introduces a novel method for query drift compensation in continual learning of retrieval embedding models, enabling compatibility with previously indexed data without re-indexing, thus improving efficiency and performance.

Contribution

It proposes a new query drift compensation technique that maintains embedding space compatibility during continual learning of retrieval models.

Findings

01

Significant performance improvement without re-indexing.

02

Effective reduction of embedding drift and forgetting.

03

Enhanced compatibility with old indexed data.

Abstract

Text embedding models enable semantic search, powering several NLP applications like Retrieval Augmented Generation by efficient information retrieval (IR). However, text embedding models are commonly studied in scenarios where the training data is static, thus limiting its applications to dynamic scenarios where new training data emerges over time. IR methods generally encode a huge corpus of documents to low-dimensional embeddings and store them in a database index. During retrieval, a semantic search over the corpus is performed and the document whose embedding is most similar to the query embedding is returned. When updating an embedding model with new training data, using the already indexed corpus is suboptimal due to the non-compatibility issue, since the model which was used to obtain the embeddings of the corpus has changed. While re-indexing of old corpus documents using the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Machine Learning and ELM