Conformal Constrained Policy Optimization for Cost-Effective LLM Agents
Wenwen Si, Sooyong Jang, Insup Lee, Osbert Bastani

TL;DR
This paper introduces Conformal Constrained Policy Optimization (CCPO), a method that reduces the cost of deploying large language model agents by orchestrating multiple models with reliability guarantees using conformal prediction.
Contribution
The paper presents CCPO, a novel training paradigm that combines constrained policy optimization, off-policy reinforcement learning, and conformal prediction for cost-effective and reliable LLM agent deployment.
Findings
Achieves up to 30% cost reduction on QA benchmarks.
Maintains reliability guarantees with conformal prediction.
Provides a practical framework for cost-effective LLM deployment.
Abstract
While large language models (LLMs) have recently made tremendous progress towards solving challenging AI problems, they have done so at increasingly steep computational and API costs. We propose a novel strategy where we combine multiple LLM models with varying cost/accuracy tradeoffs in an agentic manner, where models and tools are run in sequence as determined by an orchestration model to minimize cost subject to a user-specified level of reliability; this constraint is formalized using conformal prediction to provide guarantees. To solve this problem, we propose Conformal Constrained Policy Optimization (CCPO), a training paradigm that integrates constrained policy optimization with off-policy reinforcement learning and recent advances in online conformal prediction. CCPO jointly optimizes a cost-aware policy (score function) and an adaptive threshold. Across two multi-hop question…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
