Seven Failure Points When Engineering a Retrieval Augmented Generation   System

Scott Barnett; Stefanus Kurniawan; Srikanth Thudumu; Zach Brannelly,; Mohamed Abdelrazek

arXiv:2401.05856·cs.SE·February 5, 2024·6 cites

Seven Failure Points When Engineering a Retrieval Augmented Generation System

Scott Barnett, Stefanus Kurniawan, Srikanth Thudumu, Zach Brannelly,, Mohamed Abdelrazek

PDF

Open Access

TL;DR

This paper identifies seven key failure points in Retrieval Augmented Generation systems, highlighting challenges in validation, robustness, and domain-specific issues through three case studies across research, education, and biomedical fields.

Contribution

The paper provides a detailed experience report on RAG system failures, offering seven specific failure points and insights for designing more robust systems.

Findings

01

Validation of RAG systems is only feasible during operation.

02

Robustness of RAG systems evolves over time, not at initial design.

03

Seven failure points to consider when engineering RAG systems.

Abstract

Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG systems aim to: a) reduce the problem of hallucinated responses from LLMs, b) link sources/references to generated responses, and c) remove the need for annotating documents with meta-data. However, RAG systems suffer from limitations inherent to information retrieval systems and from reliance on LLMs. In this paper, we present an experience report on the failure points of RAG systems from three case studies from separate domains: research, education, and biomedical. We share the lessons learned and present 7 failure points to consider when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · AI in Service Interactions