GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference
Thomas Ziller, Shashikant Ilager, Alessandro Tundo, Ezio Bartocci, Leonardo Mariani, Ivona Brandic

TL;DR
GreenServ is a dynamic, context-aware routing framework for multi-model LLM inference that optimizes accuracy and energy efficiency by adaptively selecting models based on query features using a multi-armed bandit approach.
Contribution
It introduces a novel adaptive routing method for LLM inference that learns online without extensive offline calibration, improving efficiency and accuracy.
Findings
GreenServ outperforms static and random routing baselines.
Achieved 22% higher accuracy and 31% energy reduction compared to random routing.
Attained an average accuracy of 71.7% on benchmark tasks.
Abstract
Large language models (LLMs) demonstrate remarkable capabilities, but their broad deployment is limited by significant computational resource demands, particularly energy consumption during inference. Static, one-model-fits-all inference strategies are often inefficient, as they do not exploit the diverse range of available models or adapt to varying query requirements. This paper presents GreenServ, a dynamic, context-aware routing framework that optimizes the trade-off between inference accuracy and energy efficiency. GreenServ extracts lightweight contextual features from each query, including task type, semantic cluster, and text complexity, and routes queries to the most suitable model from a heterogeneous pool, based on observed accuracy and energy usage. We employ a multi-armed bandit approach to learn adaptive routing policies online. This approach operates under partial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Advanced Graph Neural Networks
