Energy-Aware Routing to Large Reasoning Models
Austin R. Ellis-Mohr, Max Hartman, and Lav R. Varshney

TL;DR
This paper explores energy-aware routing strategies for large reasoning models, focusing on balancing energy provisioning and system performance amidst stochastic fluctuations.
Contribution
It introduces a second-order, variance-aware framework for model routing that optimizes energy efficiency and performance in large reasoning systems.
Findings
Identifies the critical operating regime balancing energy supply and waste.
Develops a variance-aware routing approach based on compute scaling laws.
Provides theoretical insights into energy-performance trade-offs in LRM systems.
Abstract
Large reasoning models (LRMs) have heterogeneous inference energy costs based on which model is used and how much it reasons. To reduce energy, it is important to choose the right LRM and operate it in the right way. As a result, the performance of systems that dispatch tasks to different individual LRMs depend on the balance between mean energy provisioning and stochastic fluctuations. The critical regime is the unique operating point at which neither auxiliary energy nor baseline energy is systematically wasted. Increasing baseline supply shifts the system toward persistent over-supply and baseline-energy waste, while reducing supply induces persistent reliance on auxiliary energy. Yet in this regime, performance remains volatility-limited and so a second-order characterization provides further insights that we develop. Here, performance is governed by how variability is absorbed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
