C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning

Antonios Valkanas; Soumyasundar Pal; Pavel Rumiantsev; Yingxue Zhang; Mark Coates

arXiv:2511.07396·cs.LG·November 11, 2025

C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning

Antonios Valkanas, Soumyasundar Pal, Pavel Rumiantsev, Yingxue Zhang, Mark Coates

PDF

Open Access 1 Video

TL;DR

C3PO introduces a self-supervised cascade optimization framework for large language models that controls inference costs probabilistically while maintaining high reasoning accuracy, without requiring labeled data.

Contribution

It presents a novel, label-free method for optimizing LLM cascades with theoretical cost control guarantees and improved empirical performance on reasoning benchmarks.

Findings

01

State-of-the-art accuracy on reasoning benchmarks.

02

Effective cost control with probabilistic guarantees.

03

Outperforms existing cascade methods in accuracy and efficiency.

Abstract

Large language models (LLMs) have achieved impressive results on complex reasoning tasks, but their high inference cost remains a major barrier to real-world deployment. A promising solution is to use cascaded inference, where small, cheap models handle easy queries, and only the hardest examples are escalated to more powerful models. However, existing cascade methods typically rely on supervised training with labeled data, offer no theoretical generalization guarantees, and provide limited control over test-time computational cost. We introduce C3PO (Cost Controlled Cascaded Prediction Optimization), a self-supervised framework for optimizing LLM cascades under probabilistic cost constraints. By focusing on minimizing regret with respect to the most powerful model (MPM), C3PO avoids the need for labeled data by constructing a cascade using only unlabeled model outputs. It leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

C3PO: Optimized Large Language Model Cascades with Probabilistic Cost Constraints for Reasoning· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications