Bidirectional RAG: Safe Self-Improving Retrieval-Augmented Generation Through Multi-Stage Validation

Teja Chinthala

arXiv:2512.22199·cs.AI·December 30, 2025

Bidirectional RAG: Safe Self-Improving Retrieval-Augmented Generation Through Multi-Stage Validation

Teja Chinthala

PDF

Open Access

TL;DR

Bidirectional RAG introduces a multi-stage validation process allowing retrieval-augmented models to safely expand their knowledge base by incorporating high-quality generated responses, significantly improving coverage while maintaining safety.

Contribution

It presents a novel architecture for RAG systems that enables safe, validated corpus expansion through multi-stage acceptance, improving knowledge coverage without hallucination pollution.

Findings

01

Achieves 40.58% average coverage, nearly doubling standard RAG.

02

Adds 72% fewer documents compared to naive write back.

03

Demonstrates feasible and safe self-improving RAG systems.

Abstract

Retrieval-Augmented Generation RAG systems enhance large language models by grounding responses in external knowledge bases, but conventional RAG architectures operate with static corpora that cannot evolve from user interactions. We introduce Bidirectional RAG, a novel RAG architecture that enables safe corpus expansion through validated write back of high quality generated responses. Our system employs a multi stage acceptance layer combining grounding verification (NLI based entailment, attribution checking, and novelty detection to prevent hallucination pollution while enabling knowledge accumulation. Across four datasets Natural Questions, TriviaQA, HotpotQA, Stack Overflow with three random seeds 12 experiments per system, Bidirectional RAG achieves 40.58% average coverage nearly doubling Standard RAG 20.33% while adding 72% fewer documents than naive write back 140 vs 500. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques