Continual Learning for Generative Retrieval over Dynamic Corpora

Jiangui Chen; Ruqing Zhang; Jiafeng Guo; Maarten de Rijke; Wei Chen; Yixing Fan; Xueqi Cheng

arXiv:2308.14968·cs.IR·September 30, 2025

Continual Learning for Generative Retrieval over Dynamic Corpora

Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng

PDF

1 Repo

TL;DR

This paper introduces CLEVER, a continual learning model for generative retrieval over dynamic corpora, enabling incremental document indexing and retrieval without forgetting previous knowledge.

Contribution

The paper proposes a novel continual learning framework for generative retrieval, including Incremental Product Quantization and a memory-augmented mechanism for efficient and effective incremental document encoding.

Findings

01

CLEVER achieves high retrieval accuracy on dynamic datasets.

02

The model efficiently updates document representations with low computational cost.

03

Empirical results show CLEVER outperforms baseline methods in continual retrieval tasks.

Abstract

Generative retrieval (GR) directly predicts the identifiers of relevant documents (i.e., docids) based on a parametric model. It has achieved solid performance on many ad-hoc retrieval tasks. So far, these tasks have assumed a static document collection. In many practical scenarios, however, document collections are dynamic, where new documents are continuously added to the corpus. The ability to incrementally index new documents while preserving the ability to answer queries with both previously and newly indexed relevant documents is vital to applying GR models. In this paper, we address this practical continual learning problem for GR. We put forward a novel Continual-LEarner for generatiVE Retrieval (CLEVER) model and make two major contributions to continual learning for GR: (i) To encode new documents into docids with low computational cost, we present Incremental Product…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ict-bigdatalab/clever
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.