Budget-Constrained Online Retrieval-Augmented Generation: The Chunk-as-a-Service Model
Shawqi Al-Maliki, Ammar Gharaibeh, Mohamed Rahouti, Mohammad Ruhul Amin, Mohamed Abdallah, Junaid Qadir, Ala Al-Fuqaha

TL;DR
This paper introduces Chunk-as-a-Service (CaaS), a transparent, cost-effective retrieval-augmented generation model with an online selection algorithm that improves budget efficiency and relevance in LLM applications.
Contribution
It proposes CaaS with two variants and an online utility-cost algorithm, enhancing transparency, cost-effectiveness, and accessibility over existing RaaS models.
Findings
UCOSA outperforms relevance-greedy and offline baselines by 52%.
LB-CaaS and OB-CaaS achieve 140% and 86% higher budget efficiency.
CaaS variants outperform RaaS in budget utilization and relevance.
Abstract
Large Language Models (LLMs) have revolutionized the field of natural language processing. However, they exhibit some limitations, including a lack of reliability and transparency: they may hallucinate and fail to provide sources that support the generated output. Retrieval-Augmented Generation (RAG) was introduced to address such limitations in LLMs. One popular implementation, RAG-as-a-Service (RaaS), has shortcomings that hinder its adoption and accessibility. For instance, RaaS pricing is based on the number of submitted prompts, without considering whether the prompts are enriched by relevant chunks, i.e., text segments retrieved from a vector database, or the quality of the utilized chunks (i.e., their degree of relevance). This results in an opaque and less cost-effective payment model. We propose Chunk-as-a-Service (CaaS) as a transparent and cost-effective alternative. CaaS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
