Sustainable LLM Inference using Context-Aware Model Switching

Yuvarani; Akashdeep Singh; Zahra Fathanah; Salsabila Harlen; Syeikha Syafura Al-Zahra binti Zahari; Hema Subramaniam

arXiv:2602.22261·cs.LG·February 27, 2026

Sustainable LLM Inference using Context-Aware Model Switching

Yuvarani, Akashdeep Singh, Zahra Fathanah, Salsabila Harlen, Syeikha Syafura Al-Zahra binti Zahari, Hema Subramaniam

PDF

Open Access

TL;DR

This paper introduces a context-aware model switching system for large language model inference that significantly reduces energy consumption and latency by dynamically selecting models based on query complexity, while maintaining high output quality.

Contribution

The paper presents a novel, adaptive inference framework combining caching, rule-based scoring, ML classification, and user adaptation for energy-efficient LLM deployment.

Findings

01

Energy consumption reduced by up to 67.5%

02

Response quality maintained at 93.6% F1 score

03

Response time for simple queries improved by 68%

Abstract

Large language models have become central to many AI applications, but their growing energy consumption raises serious sustainability concerns. A key limitation in current AI deployments is the reliance on a one-size-fits-all inference strategy where most systems route every request to the same large model, regardless of task complexity, leading to substantial and unnecessary energy waste. To address this issue, we propose a context-aware model switching approach that dynamically selects an appropriate language model based on query complexity. The proposed system uses a Context-Aware Model Switching for Energy-Efficient LLM Inference that combines caching for repeated queries, rulebased complexity scoring for fast and explainable decisions, machine learning classification to capture semantic intent, and a user-adaptive component that learns from interaction patterns over time. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Green IT and Sustainability · IoT and Edge/Fog Computing