BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation

Zijun Jia; Yuanchang Ye; Sen Jia; Yiyao Qian; Haoning Wang; Baojie Chen; Diyin Tang; Jinsong Yu; Zhiyuan Wang

arXiv:2605.20084·cs.CL·May 20, 2026

BalanceRAG: Joint Risk Calibration for Cascaded Retrieval-Augmented Generation

Zijun Jia, Yuanchang Ye, Sen Jia, Yiyao Qian, Haoning Wang, Baojie Chen, Diyin Tang, Jinsong Yu, Zhiyuan Wang

PDF

TL;DR

BalanceRAG introduces a joint risk calibration method for cascaded retrieval-augmented generation, optimizing when to use retrieval based on uncertainty to improve factuality and efficiency.

Contribution

It develops a novel framework for joint risk calibration in cascaded RAG systems, enabling adaptive thresholding and multi-risk control based on uncertainty scores.

Findings

01

BalanceRAG meets prescribed risk levels across benchmarks.

02

It preserves higher coverage and correct answers.

03

It reduces unnecessary retrieval calls compared to always-on RAG.

Abstract

Large language models (LLMs) can enhance factuality via retrieval-augmented generation (RAG), but applying RAG to every query is unnecessary when the model-only answer is reliable. This motivates cascaded RAG: each query is first handled by an LLM-only branch, escalated to a RAG fallback only if the primary branch is uncertain, and abstained from when neither branch is sufficiently trustworthy. However, calibrating such cascades stage by stage may be conservative, since the final utility depends on joint uncertainty thresholding of LLM-only and RAG. In this work, we develop BalanceRAG to certify threshold pairs at a target risk level. Given uncertainty scores from the two branches, BalanceRAG frames each threshold pair as an operating point on a two-dimensional lattice and identifies safe operating points using sequential graphical testing. This enables risk-adaptive threshold…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.