CorpusBrain++: A Continual Generative Pre-Training Framework for Knowledge-Intensive Language Tasks
Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de, Rijke, Yixing Fan, Xueqi Cheng

TL;DR
CorpusBrain++ is a new continual pre-training framework designed to improve knowledge-intensive language tasks by effectively handling dynamic document updates, addressing the limitations of static retrieval models like CorpusBrain.
Contribution
It introduces the CDL task and KILT++ benchmark, and proposes CorpusBrain++, a continual learning approach that mitigates catastrophic forgetting in dynamic retrieval scenarios.
Findings
CorpusBrain++ outperforms traditional IR methods in dynamic settings.
CorpusBrain++ significantly reduces catastrophic forgetting.
Empirical results show improved retrieval performance and efficiency.
Abstract
Knowledge-intensive language tasks (KILTs) typically require retrieving relevant documents from trustworthy corpora, e.g., Wikipedia, to produce specific answers. Very recently, a pre-trained generative retrieval model for KILTs, named CorpusBrain, was proposed and reached new state-of-the-art retrieval performance. However, most existing research on KILTs, including CorpusBrain, has predominantly focused on a static document collection, overlooking the dynamic nature of real-world scenarios, where new documents are continuously being incorporated into the source corpus. To address this gap, it is crucial to explore the capability of retrieval models to effectively handle the dynamic retrieval scenario inherent in KILTs. In this work, we first introduce the continual document learning (CDL) task for KILTs and build a novel benchmark dataset named KILT++ based on the original KILT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning
