L^2R: Lifelong Learning for First-stage Retrieval with Backward-Compatible Representations
Yinqiong Cai, Keping Bi, Yixing Fan, Jiafeng Guo, Wei Chen, Xueqi, Cheng

TL;DR
This paper introduces L^2R, a lifelong learning approach for first-stage retrieval that adapts to evolving, unlabeled web data while maintaining backward-compatible representations to efficiently update retrieval indexes.
Contribution
The paper proposes a novel lifelong learning method for retrieval that effectively adapts to new data and preserves backward compatibility, reducing update costs.
Findings
L^2R outperforms baseline methods in simulated distribution drift scenarios.
The ranking alignment objective maintains backward compatibility without sacrificing performance.
Constructed new benchmarks to evaluate lifelong learning in retrieval tasks.
Abstract
First-stage retrieval is a critical task that aims to retrieve relevant document candidates from a large-scale collection. While existing retrieval models have achieved impressive performance, they are mostly studied on static data sets, ignoring that in the real-world, the data on the Web is continuously growing with potential distribution drift. Consequently, retrievers trained on static old data may not suit new-coming data well and inevitably produce sub-optimal results. In this work, we study lifelong learning for first-stage retrieval, especially focusing on the setting where the emerging documents are unlabeled since relevance annotation is expensive and may not keep up with data emergence. Under this setting, we aim to develop model updating with two goals: (1) to effectively adapt to the evolving distribution with the unlabeled new-coming data, and (2) to avoid re-inferring all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
