BudgetLeak: Membership Inference Attacks on RAG Systems via the Generation Budget Side Channel
Hao Li, Jiajun He, Guangshuo Wang, Dengguo Feng, Zheng Li, Min Zhang

TL;DR
This paper introduces BudgetLeak, a novel side-channel attack exploiting the generation budget in RAG systems to infer membership, revealing a new privacy risk in these models.
Contribution
The paper uncovers the generation budget as a side channel in RAG systems and proposes BudgetLeak, a new attack method that outperforms existing approaches.
Findings
BudgetLeak effectively infers membership across multiple datasets and models.
The attack outperforms existing baselines in accuracy and efficiency.
Generation budget influences response behavior, enabling inference.
Abstract
Retrieval-Augmented Generation (RAG) enhances large language models by integrating external knowledge, but reliance on proprietary or sensitive corpora poses various data risks, including privacy leakage and unauthorized data usage. Membership inference attacks (MIAs) are a common technique to assess such risks, yet existing approaches underperform in RAG due to black-box constraints and the absence of strong membership signals. In this paper, we identify a previously unexplored side channel in RAG systems: the generation budget, which controls the maximum number of tokens allowed in a generated response. Varying this budget reveals observable behavioral patterns between member and non-member queries, as members gain quality more rapidly with larger budgets. Building on this insight, we propose BudgetLeak, a novel membership inference attack that probes responses under different budgets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks
