Deploying Large Language Models With Retrieval Augmented Generation
Sonal Prabhune, Donald J. Berndt

TL;DR
This paper discusses deploying Large Language Models with Retrieval Augmented Generation (RAG), highlighting real-world implementation, benefits, challenges, and best practices for integrating external data sources to improve factual accuracy.
Contribution
It presents practical insights from a pilot project deploying LLMs with RAG, including best practices, governance models, and real-world application considerations.
Findings
Enhanced factual accuracy in LLM outputs
Identified key challenges in real-world deployment
Proposed governance framework for AI compliance
Abstract
Knowing that the generative capabilities of large language models (LLM) are sometimes hampered by tendencies to hallucinate or create non-factual responses, researchers have increasingly focused on methods to ground generated outputs in factual data. Retrieval Augmented Generation (RAG) has emerged as a key approach for integrating knowledge from data sources outside of the LLM's training set, including proprietary and up-to-date information. While many research papers explore various RAG strategies, their true efficacy is tested in real-world applications with actual data. The journey from conceiving an idea to actualizing it in the real world is a lengthy process. We present insights from the development and field-testing of a pilot project that integrates LLMs with RAG for information retrieval. Additionally, we examine the impacts on the information value chain, encompassing people,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsAttention Is All You Need · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Weight Decay · Byte Pair Encoding · Linear Layer · Softmax · BERT · Multi-Head Attention
