Seven Failure Points When Engineering a Retrieval Augmented Generation System
Scott Barnett, Stefanus Kurniawan, Srikanth Thudumu, Zach Brannelly,, Mohamed Abdelrazek

TL;DR
This paper identifies seven key failure points in Retrieval Augmented Generation systems, highlighting challenges in validation, robustness, and domain-specific issues through three case studies across research, education, and biomedical fields.
Contribution
The paper provides a detailed experience report on RAG system failures, offering seven specific failure points and insights for designing more robust systems.
Findings
Validation of RAG systems is only feasible during operation.
Robustness of RAG systems evolves over time, not at initial design.
Seven failure points to consider when engineering RAG systems.
Abstract
Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG systems aim to: a) reduce the problem of hallucinated responses from LLMs, b) link sources/references to generated responses, and c) remove the need for annotating documents with meta-data. However, RAG systems suffer from limitations inherent to information retrieval systems and from reliance on LLMs. In this paper, we present an experience report on the failure points of RAG systems from three case studies from separate domains: research, education, and biomedical. We share the lessons learned and present 7 failure points to consider when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · AI in Service Interactions
