CARROT: A Cost Aware Rate Optimal Router

Seamus Somerstep; Felipe Maia Polo; Allysson Flavio Melo de Oliveira; Prattyush Mangal; M\'irian Silva; Onkar Bhardwaj; Mikhail Yurochkin; Subha Maity

arXiv:2502.03261·stat.ML·May 21, 2025

CARROT: A Cost Aware Rate Optimal Router

Seamus Somerstep, Felipe Maia Polo, Allysson Flavio Melo de Oliveira, Prattyush Mangal, M\'irian Silva, Onkar Bhardwaj, Mikhail Yurochkin, Subha Maity

PDF

Open Access 1 Repo 5 Reviews

TL;DR

This paper introduces CARROT, a cost-aware routing algorithm for large language models that optimally balances cost and accuracy, supported by a new dataset and empirical validation against existing benchmarks.

Contribution

The paper proposes CARROT, a novel routing method that predicts cost and accuracy for model selection, and introduces the SPROUT dataset for benchmarking LLM routing performance.

Findings

01

CARROT achieves minimax optimality in routing decisions.

02

Empirical results show CARROT outperforms alternative routers on SPROUT and other benchmarks.

03

The SPROUT dataset enables comprehensive evaluation of LLM routing strategies.

Abstract

With the rapid growth in the number of Large Language Models (LLMs), there has been a recent interest in LLM routing, or directing queries to the cheapest LLM that can deliver a suitable response. We conduct a minimax analysis of the routing problem, providing a lower bound and finding that a simple router that predicts both cost and accuracy for each question can be minimax optimal. Inspired by this, we introduce CARROT, a Cost AwaRe Rate Optimal rouTer that selects a model based on estimates of the models' cost and performance. Alongside CARROT, we also introduce the Smart Price-aware ROUTing (SPROUT) dataset to facilitate routing on a wide spectrum of queries with the latest state-of-the-art LLMs. Using SPROUT and prior benchmarks such as Routerbench and open-LLM-leaderboard-v2 we empirically validate CARROT's performance against several alternative routers.

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 3

Strengths

1. The performance of CARROT is competitive.

Weaknesses

1. For better or worse, the paper is heavily packed with formal theorems and minimax analyses. While this mathematical framing is thorough, it sometimes overshadows the practical contribution. It would be helpful if the authors clearly indicated where the proofs for each theorem are located. 2. What's the difference between CARROT and prior routing work, i.e. the main novelty here? I believe similar multi-model risk estimators have been used in earlier “model selection” and “LLM routing” papers,

Reviewer 02Rating 6Confidence 4

Strengths

1. This paper formulates the LLM routing problem as a minmax rate optimal problem. The technical development seems solid. 2. How to balance the achieved response quality and incurred inference cost is a critical problem in modern LLM serving system. 3. The evaluation result is impressive. For example, at 30% of the cost, CARROT matches or exceeds the performance of GPT-4o on each benchmark.

Weaknesses

1. Some advanced routing baselines are neither compared nor discussed. For example, - Ding, Dujian, et al. "BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute." Forty-second International Conference on Machine Learning. 2. This paper leverages a sequence of assumptions. It would add great values to this work if authors could discuss to which extent these assumptions may hold in practice.

Reviewer 03Rating 4Confidence 4

Strengths

1. The paper effectively demonstrates the empirical value of its approach. The experiments show that a cost-aware router like CARROT can achieve a more efficient cost-performance trade-off than baselines, particularly binary routers and non-predictive routers. 2. The paper provides a theoretical justification for its method in Section 3 . While the "plug-in" concept of estimating and then optimizing is intuitive, the authors formally prove that this simple approach is, in fact, minimax rate-opti

Weaknesses

1. The structure of the introductory sections feels somewhat unconventional. The second paragraph is very long. It would be clearer to split this section into "Introduction" and "Related Work/Background". Additionally, the current Section 2 and Section 3 are intrinsically linked, defining and then analyzing the same problem. These could be combined into a single, comprehensive "Problem Formulation and Theoretical Analysis" section to improve readability. 2. A concern arises from the construction

Reviewer 04Rating 4Confidence 3

Strengths

1. I feel LLM routing is a very interesting and practical problem. 2. This work contains both theoretical analysis and empirical results, which look complete to me. 3. The empirical result show the strong performance of the proposed CARROT algorithm over baselines on several benchmarks. 4. I found the presentation of this work clear, and most details are included in the Appendix.

Weaknesses

1. I feel the novelty of the theoretical deduction is overall limited. By reading the proof, I feel the analysis is adapted from the existing statistical learning theory on binary classification and risk control. And there has been quite comprehensive existing theoretical literature on lower bound and upper bound deduction for the problem setting used in this work. Could authors elaborate on any novelty on your theory? 2. I feel this work can include more baselines for comparison as there are so

Reviewer 05Rating 2Confidence 4

Strengths

the paper provides a clear mathematical framework for the routing problem with a principled treatment of routing - a increasingly practical relevant problem

Weaknesses

- Limited novelty. The main idea of routing queries between models based on cost or accuracy is well established. The theoretical analysis adds formality but does not lead to a substantively new method or insight. - Model–prompt coupling not addressed. In practice, system prompts are heavily engineered for specific models. Swapping models without re-tuning instructions is rarely valid, and the mild adjustments through per-model chat templates seem insufficient to ensure fair comparison or practi

Code & Models

Repositories

codelion/adaptive-classifier
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optical Network Technologies · Network Traffic and Congestion Control · Interconnection Networks and Systems

MethodsAttentive Walk-Aggregating Graph Neural Network