MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering
Kunal Sawarkar, Shivam R. Solanki, Abhilasha Mangal

TL;DR
MetaGen Blended RAG introduces a zero-shot enterprise question-answering method that enhances semantic retrieval with metadata and hybrid indexing, outperforming prior benchmarks without fine-tuning.
Contribution
The paper presents a novel approach combining metadata generation and hybrid indexing to improve zero-shot domain-specific retrieval in RAG systems, eliminating the need for fine-tuning.
Findings
Achieves 82% retrieval accuracy on PubMedQA
Surpasses all prior zero-shot RAG benchmarks
Rivals fine-tuned models on biomedical datasets
Abstract
Retrieval-Augmented Generation (RAG) struggles with domain-specific enterprise datasets, often isolated behind firewalls and rich in complex, specialized terminology unseen by LLMs during pre-training. Semantic variability across domains like medicine, networking, or law hampers RAG's context precision, while fine-tuning solutions are costly, slow, and lack generalization as new data emerges. Achieving zero-shot precision with retrievers without fine-tuning still remains a key challenge. We introduce 'MetaGen Blended RAG', a novel enterprise search approach that enhances semantic retrievers through a metadata generation pipeline and hybrid query indexes using dense and sparse vectors. By leveraging key concepts, topics, and acronyms, our method creates metadata-enriched semantic indexes and boosted hybrid queries, delivering robust, scalable performance without fine-tuning. On the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRAG
