Training the Knowledge Base through Evidence Distillation and Write-Back Enrichment
Yuxing Lu, Xukai Zhao, Wei Wu, Jinzhuo Wang

TL;DR
This paper introduces WriteBack-RAG, a framework that dynamically updates the knowledge base in retrieval-augmented generation systems by distilling relevant information into compact units, enhancing performance across multiple benchmarks.
Contribution
The paper presents a novel offline preprocessing method to refine the knowledge base in RAG systems, improving retrieval quality and generalization across different models and benchmarks.
Findings
Average performance gains of +2.14% across benchmarks
Improved RAG performance with knowledge distillation
Knowledge benefits transfer across different RAG pipelines
Abstract
The knowledge base in a retrieval-augmented generation (RAG) system is typically assembled once and never revised, even though the facts a query requires are often fragmented across documents and buried in irrelevant content. We argue that the knowledge base should be treated as a trainable component and propose WriteBack-RAG, a framework that uses labeled examples to identify where retrieval succeeds, isolate the relevant documents, and distill them into compact knowledge units that are indexed alongside the original corpus. Because the method modifies only the corpus, it can be applied once as an offline preprocessing step and combined with any RAG pipeline. Across four RAG methods, six benchmarks, and two LLM backbones, WriteBack-RAG improves every evaluated setting, with gains averaging +2.14%. Cross-method transfer experiments further show that the distilled knowledge benefits RAG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Biomedical Text Mining and Ontologies
