Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval
Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan,, Xueqi Cheng

TL;DR
This paper introduces BootRet, a dynamic pre-training method for generative retrieval that updates document identifiers during training, leading to improved retrieval performance especially in zero-shot scenarios.
Contribution
The paper proposes BootRet, a novel bootstrapped pre-training approach that dynamically adjusts document identifiers, enhancing generative retrieval models beyond static identifier methods.
Findings
BootRet outperforms existing pre-training methods in retrieval tasks.
BootRet achieves strong zero-shot retrieval performance.
Dynamic identifier updating improves model memorization and relevance prediction.
Abstract
Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query. Recent studies have highlighted the potential of a strong generative retrieval model, trained with carefully crafted pre-training tasks, to enhance downstream retrieval tasks via fine-tuning. However, the full power of pre-training for generative retrieval remains underexploited due to its reliance on pre-defined static document identifiers, which may not align with evolving model parameters. In this work, we introduce BootRet, a bootstrapped pre-training method for generative retrieval that dynamically adjusts document identifiers during pre-training to accommodate the continuing memorization of the corpus. BootRet involves three key training phases: (i) initial identifier generation, (ii) pre-training via corpus indexing and relevance prediction tasks, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
MethodsALIGN
