Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows
Yinwei Dai, Zhuofu Chen, Anand Iyer, and Ravi Netravali

TL;DR
Aragog is a system that dynamically adapts model routing configurations during workflow execution, significantly improving scalability and efficiency in serving complex agentic workflows with large language models.
Contribution
It introduces a novel approach that decouples configuration identification from runtime scheduling, enabling adaptive, cost-effective workflow serving.
Findings
Increases maximum throughput by up to 217%.
Reduces median latency by up to 78.9%.
Maintains accuracy comparable to expensive configurations.
Abstract
Agentic workflows have emerged as a powerful paradigm for solving complex, multi-stage tasks, but serving them at scale is computationally expensive given the many LLM inferences that each request must pass through. Configuration selection, or the cost-aware assignment of workflow agents to specific LLMs, can reduce these costs, but existing approaches bind configuration decisions before request execution, making them ill-suited for the heterogeneous and lengthy execution of workflows. Specifically, system loads can fluctuate rapidly and substantially during a request's lifetime, causing fixed configurations to quickly become suboptimal. We present Aragog, a system that progressively adapts a request's configuration throughout its execution to match runtime dynamics. To make this practical despite the massive space of workflow configurations, Aragog decouples the problem into two core…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Business Process Modeling and Analysis · Cloud Computing and Resource Management
