Deploying Large Language Models With Retrieval Augmented Generation

Sonal Prabhune; Donald J. Berndt

arXiv:2411.11895·cs.IR·November 20, 2024·2 cites

Deploying Large Language Models With Retrieval Augmented Generation

Sonal Prabhune, Donald J. Berndt

PDF

Open Access 1 Repo

TL;DR

This paper discusses deploying Large Language Models with Retrieval Augmented Generation (RAG), highlighting real-world implementation, benefits, challenges, and best practices for integrating external data sources to improve factual accuracy.

Contribution

It presents practical insights from a pilot project deploying LLMs with RAG, including best practices, governance models, and real-world application considerations.

Findings

01

Enhanced factual accuracy in LLM outputs

02

Identified key challenges in real-world deployment

03

Proposed governance framework for AI compliance

Abstract

Knowing that the generative capabilities of large language models (LLM) are sometimes hampered by tendencies to hallucinate or create non-factual responses, researchers have increasingly focused on methods to ground generated outputs in factual data. Retrieval Augmented Generation (RAG) has emerged as a key approach for integrating knowledge from data sources outside of the LLM's training set, including proprietary and up-to-date information. While many research papers explore various RAG strategies, their true efficacy is tested in real-world applications with actual data. The journey from conceiving an idea to actualizing it in the real world is a lengthy process. We present insights from the development and field-testing of a pilot project that integrates LLMs with RAG for information retrieval. Additionally, we examine the impacts on the information value chain, encompassing people,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SonalPrabhune/RAG
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Weight Decay · Byte Pair Encoding · Linear Layer · Softmax · BERT · Multi-Head Attention