CascadeDebate: Multi-Agent Deliberation for Cost-Aware LLM Cascades

Raeyoung Chang; Dongwook Kwon; Jisoo Lee; Nikhil Verma

arXiv:2604.12262·cs.CL·April 15, 2026

CascadeDebate: Multi-Agent Deliberation for Cost-Aware LLM Cascades

Raeyoung Chang, Dongwook Kwon, Jisoo Lee, Nikhil Verma

PDF

TL;DR

CascadeDebate introduces a multi-agent deliberation framework within LLM cascades, dynamically balancing accuracy and cost by resolving ambiguities internally before escalating to more expensive models or humans.

Contribution

It presents a novel architecture that integrates multi-agent deliberation at each cascade stage, improving efficiency and accuracy over traditional single-model cascades.

Findings

01

Outperforms strong single-model cascades and multi-agent systems by up to 26.75%.

02

Dynamic compute scaling based on query difficulty improves efficiency.

03

Online threshold optimization significantly boosts accuracy and adaptability.

Abstract

Cascaded LLM systems coordinate models of varying sizes with human experts to balance accuracy, cost, and abstention under uncertainty. However, single-model tiers at each stage often struggle with ambiguous queries, triggering premature escalations to costlier models or experts due to under-confidence and inefficient compute scaling. CascadeDebate addresses this gap by inserting multi-agent deliberation directly at each tier's escalation boundary. Confidence-based routers activate lightweight agent ensembles only for uncertain cases, enabling consensus-driven resolution of ambiguities internally without invoking higher-cost upgrades. Our unified architecture alternates single-model inference with selective multi-agent deliberation across model scales, culminating in human experts as the final fallback. This design scales test-time compute dynamically according to query difficulty.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.