CascadeDebate: Multi-Agent Deliberation for Cost-Aware LLM Cascades
Raeyoung Chang, Dongwook Kwon, Jisoo Lee, Nikhil Verma

TL;DR
CascadeDebate introduces a multi-agent deliberation framework within LLM cascades, dynamically balancing accuracy and cost by resolving ambiguities internally before escalating to more expensive models or humans.
Contribution
It presents a novel architecture that integrates multi-agent deliberation at each cascade stage, improving efficiency and accuracy over traditional single-model cascades.
Findings
Outperforms strong single-model cascades and multi-agent systems by up to 26.75%.
Dynamic compute scaling based on query difficulty improves efficiency.
Online threshold optimization significantly boosts accuracy and adaptability.
Abstract
Cascaded LLM systems coordinate models of varying sizes with human experts to balance accuracy, cost, and abstention under uncertainty. However, single-model tiers at each stage often struggle with ambiguous queries, triggering premature escalations to costlier models or experts due to under-confidence and inefficient compute scaling. CascadeDebate addresses this gap by inserting multi-agent deliberation directly at each tier's escalation boundary. Confidence-based routers activate lightweight agent ensembles only for uncertain cases, enabling consensus-driven resolution of ambiguities internally without invoking higher-cost upgrades. Our unified architecture alternates single-model inference with selective multi-agent deliberation across model scales, culminating in human experts as the final fallback. This design scales test-time compute dynamically according to query difficulty.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
