CONCUR: A Framework for Continual Constrained and Unconstrained Routing
Peter Baile Chen, Weiyue Li, Dan Roth, Michael Cafarella, Samuel Madden, Jacob Andreas

TL;DR
CONCUR is a modular continual routing framework that efficiently adapts to new strategies and captures task complexity, outperforming existing methods in accuracy and cost across diverse tasks.
Contribution
It introduces a modular predictor-based routing system supporting both constrained and unconstrained routing with low retraining overhead.
Findings
Outperforms existing routing techniques in accuracy and inference cost.
Supports seamless incorporation of new strategies with minimal retraining.
Reduces training costs in continual learning scenarios.
Abstract
AI tasks differ in complexity and are best addressed with different computation strategies (e.g., combinations of models and decoding methods). Hence, an effective routing system that maps tasks to the appropriate strategies is crucial. Most prior methods build the routing framework by training a single model across all strategies, which demands full retraining whenever new strategies appear and leads to high overhead. Attempts at such continual routing, however, often face difficulties with generalization. Prior models also typically use a single input representation, limiting their ability to capture the full complexity of the routing problem and leading to sub-optimal routing decisions. To address these gaps, we propose CONCUR, a continual routing framework that supports both constrained and unconstrained routing (i.e., routing with or without a budget). Our modular design trains a…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper addresses a timely problem of efficiently routing between a growing variety of models to solve a task with maximum accuracy and efficiency (measured in FLOPs). 2. The proposed method of learning a model specific predictor without retraining the strategy prediction backbone makes routing to new models seamless as also verified by the experiments in the paper.
1. The authors use FLOPs to measure and learn the efficiency while choosing between models. Are the FLOPs measured by counting the number of multiply-add operations? If that is the case, the proposed method will not capture the efficiency achieved by quantized models, which can be much more efficient and still achieve similar accuracies (most models are now released with lightweight quantized version). 2. It is unclear to me, how the base strategy predictor performs when the number of input stra
1. The modular predictor design directly solves the retraining overhead issue of existing methods. Integrating new strategies with minimal cost aligns with real-world needs (e.g., frequent updates of language models), making it highly scalable. 2. By combining general-purpose and task-specific representations, CONCUR captures both universal patterns (via general embeddings) and task/strategy-specific nuances (via learnable representations), which is empirically shown to improve routing decision
1. The general-purpose representation relies on a frozen pre-trained embedding model (ALL-MPNET-BASE-v2). The paper does not evaluate how changes to this model (e.g., using a smaller/larger embedding model) affect predictor performance, raising concerns about its robustness to embedding choice. 2. The unconstrained routing uses a weight to balance accuracy and cost, but the paper does not analyze how different weight values (e.g., weight=0.1 for cost prioritization vs. weight=0.9 for accuracy pr
- The paper addresses an important and practical problem. Effectively routing among multiple LLMs and decoding strategies under varying cost and performance constraints has become an increasingly relevant and widely discussed topic. This topic is relevant to the growing need for efficient deployment of large reasoning models, especially in settings where computational budgets and task distributions evolve over time. The formulation of both continual and non-continual routing scenarios adds furth
- Although the paper repeatedly discusses routing in both continual and non-continual settings, the problem itself is never formally defined. It remains unclear what exactly constitutes a routing input and output, and how the optimization objective is formulated in mathematical terms. The framework is presented primarily at a conceptual level, which makes it difficult to precisely understand the problem scope and to reproduce the approach in other contexts. As a result, the paper may not be self
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
