Cost-Aware Model Orchestration for LLM-based Systems
Daria Smirnova, Hamid Nasiri, Marta Adamska, Zhengxin Yu, Peter Garraghan

TL;DR
This paper introduces a cost-aware model selection approach for LLM-based systems that improves accuracy and energy efficiency by better reflecting model capabilities in orchestration decisions.
Contribution
It presents an empirical analysis of LLM orchestration limitations and proposes a novel cost-aware selection method incorporating quantitative performance metrics.
Findings
Increases task accuracy by up to 11.92%.
Achieves up to 54% energy efficiency improvement.
Reduces model selection latency from 4.51 s to 7.2 ms.
Abstract
As modern artificial intelligence (AI) systems become more advanced and capable, they can leverage a wide range of tools and models to perform complex tasks. The task of orchestrating these models is increasingly performed by Large Language Models (LLMs) that rely on qualitative descriptions of models for decision-making. However, the descriptions provided to existing LLM-based orchestrators frequently do not reflect true model capabilities and performance characteristics, leading to suboptimal model selection, reduced task accuracy, and increased cost. In this paper, we conduct an empirical analysis of LLM-based orchestration limitations and propose a cost-aware model selection method that accounts for performance-cost trade-offs by incorporating quantitative model performance characteristics within decision-making. Initial experimental results demonstrate that our proposed method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
