RAG-Stack: Co-Optimizing RAG Quality and Performance From the Vector Database Perspective
Wenqi Jiang

TL;DR
This paper introduces RAG-Stack, a comprehensive framework for jointly optimizing retrieval-augmented generation systems' quality and performance by decoupling, modeling, and exploring configurations.
Contribution
It proposes a novel three-pillar blueprint with an intermediate representation, a cost model, and a plan exploration algorithm for RAG system optimization.
Findings
RAG-Stack effectively balances quality and performance in RAG systems.
The RAG-IR abstraction decouples quality and performance considerations.
The RAG-CM accurately estimates system performance for different configurations.
Abstract
Retrieval-augmented generation (RAG) has emerged as one of the most prominent applications of vector databases. By integrating documents retrieved from a database into the prompt of a large language model (LLM), RAG enables more reliable and informative content generation. While there has been extensive research on vector databases, many open research problems remain once they are considered in the wider context of end-to-end RAG pipelines. One practical yet challenging problem is how to jointly optimize both system performance and generation quality in RAG, which is significantly more complex than it appears due to the numerous knobs on both the algorithmic side (spanning models and databases) and the systems side (from software to hardware). In this paper, we present RAG-Stack, a three-pillar blueprint for quality-performance co-optimization in RAG systems. RAG-Stack comprises: (1)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Natural Language Processing Techniques · Text and Document Classification Technologies
