RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots

Philip Feldman; James R. Foulds; Shimei Pan

arXiv:2403.01193·cs.CL·June 13, 2024·5 cites

RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots

Philip Feldman, James R. Foulds, Shimei Pan

PDF

Open Access

TL;DR

This paper investigates how Retrieval-Augmented Generation (RAG) can reduce hallucinations in large language models by integrating external knowledge, highlighting its benefits and limitations through empirical evaluation.

Contribution

It provides an empirical assessment of RAG's effectiveness in mitigating hallucinations in LLMs and discusses practical deployment considerations.

Findings

01

RAG improves accuracy in some hallucination scenarios

02

RAG can still be misled by contradictory prompts

03

Hallucinations remain a complex challenge for LLMs

Abstract

Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · FinTech, Crowdfunding, Digital Finance · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · WordPiece · Layer Normalization · Byte Pair Encoding · Dropout · Multi-Head Attention · Attention Dropout · Linear Warmup With Linear Decay