Sustainable LLM Inference using Context-Aware Model Switching
Yuvarani, Akashdeep Singh, Zahra Fathanah, Salsabila Harlen, Syeikha Syafura Al-Zahra binti Zahari, Hema Subramaniam

TL;DR
This paper introduces a context-aware model switching system for large language model inference that significantly reduces energy consumption and latency by dynamically selecting models based on query complexity, while maintaining high output quality.
Contribution
The paper presents a novel, adaptive inference framework combining caching, rule-based scoring, ML classification, and user adaptation for energy-efficient LLM deployment.
Findings
Energy consumption reduced by up to 67.5%
Response quality maintained at 93.6% F1 score
Response time for simple queries improved by 68%
Abstract
Large language models have become central to many AI applications, but their growing energy consumption raises serious sustainability concerns. A key limitation in current AI deployments is the reliance on a one-size-fits-all inference strategy where most systems route every request to the same large model, regardless of task complexity, leading to substantial and unnecessary energy waste. To address this issue, we propose a context-aware model switching approach that dynamically selects an appropriate language model based on query complexity. The proposed system uses a Context-Aware Model Switching for Energy-Efficient LLM Inference that combines caching for repeated queries, rulebased complexity scoring for fast and explainable decisions, machine learning classification to capture semantic intent, and a user-adaptive component that learns from interaction patterns over time. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Green IT and Sustainability · IoT and Edge/Fog Computing
