Meta Knowledge for Retrieval Augmented Large Language Models
Laurent Mombaerts, Terry Ding, Adi Banerjee, Florian Felice, Jonathan, Taws, Tarik Borogovac

TL;DR
This paper introduces a new data-centric retrieval augmented generation framework that uses meta knowledge and synthetic QA to improve information retrieval and synthesis in large language models, achieving higher accuracy and depth.
Contribution
The paper presents a novel prepare-then-rewrite-then-retrieve framework with meta knowledge summaries, enhancing RAG systems with synthetic QA and metadata for better domain understanding.
Findings
Augmented queries with synthetic question matching outperform traditional RAG (p < 0.01)
Meta knowledge-augmented queries improve retrieval precision and answer quality
Cost-effective approach using Claude 3 Haiku for large-scale document processing
Abstract
Retrieval Augmented Generation (RAG) is a technique used to augment Large Language Models (LLMs) with contextually relevant, time-critical, or domain-specific information without altering the underlying model parameters. However, constructing RAG systems that can effectively synthesize information from large and diverse set of documents remains a significant challenge. We introduce a novel data-centric RAG workflow for LLMs, transforming the traditional retrieve-then-read system into a more advanced prepare-then-rewrite-then-retrieve-then-read framework, to achieve higher domain expert-level understanding of the knowledge base. Our methodology relies on generating metadata and synthetic Questions and Answers (QA) for each document, as well as introducing the new concept of Meta Knowledge Summary (MK Summary) for metadata-based clusters of documents. The proposed innovations enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Sparse Evolutionary Training · Linear Layer · Attention Dropout · WordPiece · Layer Normalization · Multi-Head Attention · Linear Warmup With Linear Decay · Weight Decay · Adam
