EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization

Zhiye Song; Kyungmi Lee; Eun Kyung Lee; Xin Zhang; Tamar Eilam; Anantha P. Chandrakasan

arXiv:2605.14249·cs.LG·May 15, 2026

EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization

Zhiye Song, Kyungmi Lee, Eun Kyung Lee, Xin Zhang, Tamar Eilam, Anantha P. Chandrakasan

PDF

TL;DR

EnergyLens is a comprehensive framework that predicts and optimizes energy consumption for multi-GPU large language model inference, aiding practitioners in efficient deployment decisions.

Contribution

It introduces an intuitive interface and models for accurately predicting multi-GPU energy behavior, enabling energy-aware optimization without exhaustive profiling.

Findings

01

Achieves MAPEs between 9.25% and 13.19% for energy prediction.

02

Reveals up to 1.47x and 52.9x energy variation across configurations.

03

Identifies Pareto-optimal overlap configurations for compute-communication overlap.

Abstract

We present EnergyLens, an end-to-end framework for energy-aware large language model (LLM) inference optimization. As LLMs scale, predicting and reducing their energy footprint has become critical for sustainability and datacenter operations, yet existing approaches either require production-level code and expensive profiling or fail to accurately capture multi-GPU energy behavior. As a result, practitioners lack tools for deciding which optimizations to prioritize and for selecting among existing deployment configurations when exhaustive profiling is impractical. EnergyLens addresses this gap with an intuitive einsum-based interface that captures LLM specifications including fusion, parallelism, and compute-communication overlap, combined with load-imbalance-aware MoE modeling and an empirically driven communication energy model for multi-GPU settings. We validate EnergyLens on Llama3…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.