BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation
Zijun Jia, Yuanchang Ye, Sen Jia, Yiyao Qian, Haoning Wang, Baojie Chen, Diyin Tang, Jinsong Yu, Zhiyuan Wang

TL;DR
BalanceRAG introduces a joint risk calibration method for cascaded retrieval-augmented generation, optimizing when to use retrieval based on uncertainty to improve factuality and efficiency.
Contribution
It develops a novel framework for joint risk calibration in cascaded RAG systems, enabling adaptive thresholding and multi-risk control based on uncertainty scores.
Findings
BalanceRAG meets prescribed risk levels across benchmarks.
It preserves higher coverage and correct answers.
It reduces unnecessary retrieval calls compared to always-on RAG.
Abstract
Large language models (LLMs) can enhance factuality via retrieval-augmented generation (RAG), but applying RAG to every query is unnecessary when the model-only answer is reliable. This motivates cascaded RAG: each query is first handled by an LLM-only branch, escalated to a RAG fallback only if the primary branch is uncertain, and abstained from when neither branch is sufficiently trustworthy. However, calibrating such cascades stage by stage may be conservative, since the final utility depends on joint uncertainty thresholding of LLM-only and RAG. In this work, we develop BalanceRAG to certify threshold pairs at a target risk level. Given uncertainty scores from the two branches, BalanceRAG frames each threshold pair as an operating point on a two-dimensional lattice and identifies safe operating points using sequential graphical testing. This enables risk-adaptive threshold…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
