Budget-Constrained Online Retrieval-Augmented Generation: The Chunk-as-a-Service Model

Shawqi Al-Maliki; Ammar Gharaibeh; Mohamed Rahouti; Mohammad Ruhul Amin; Mohamed Abdallah; Junaid Qadir; Ala Al-Fuqaha

arXiv:2604.26981·cs.IR·May 1, 2026

Budget-Constrained Online Retrieval-Augmented Generation: The Chunk-as-a-Service Model

Shawqi Al-Maliki, Ammar Gharaibeh, Mohamed Rahouti, Mohammad Ruhul Amin, Mohamed Abdallah, Junaid Qadir, Ala Al-Fuqaha

PDF

TL;DR

This paper introduces Chunk-as-a-Service (CaaS), a transparent, cost-effective retrieval-augmented generation model with an online selection algorithm that improves budget efficiency and relevance in LLM applications.

Contribution

It proposes CaaS with two variants and an online utility-cost algorithm, enhancing transparency, cost-effectiveness, and accessibility over existing RaaS models.

Findings

01

UCOSA outperforms relevance-greedy and offline baselines by 52%.

02

LB-CaaS and OB-CaaS achieve 140% and 86% higher budget efficiency.

03

CaaS variants outperform RaaS in budget utilization and relevance.

Abstract

Large Language Models (LLMs) have revolutionized the field of natural language processing. However, they exhibit some limitations, including a lack of reliability and transparency: they may hallucinate and fail to provide sources that support the generated output. Retrieval-Augmented Generation (RAG) was introduced to address such limitations in LLMs. One popular implementation, RAG-as-a-Service (RaaS), has shortcomings that hinder its adoption and accessibility. For instance, RaaS pricing is based on the number of submitted prompts, without considering whether the prompts are enriched by relevant chunks, i.e., text segments retrieved from a vector database, or the quality of the utilized chunks (i.e., their degree of relevance). This results in an opaque and less cost-effective payment model. We propose Chunk-as-a-Service (CaaS) as a transparent and cost-effective alternative. CaaS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.