EnergyLens: Interpretable Closed-Form Energy Models for Multimodal LLM Inference Serving
Vittorio Palladino, Gianluca Palermo, Michael E. Papka, Zhiling Lan

TL;DR
EnergyLens introduces an interpretable, closed-form energy model for multimodal LLM inference that requires minimal profiling data and accurately predicts energy consumption across diverse configurations and hardware.
Contribution
It proposes a symbolic regression-based approach to derive a physically interpretable energy model that outperforms black-box surrogates in accuracy and data efficiency.
Findings
Achieves 88.2% configuration selection accuracy.
Requires only 50 profiling measurements for fitting.
Generalizes reliably to unseen batch sizes and hardware.
Abstract
As large language models span dense, mixture-of-experts, and state-space architectures and are deployed on heterogeneous accelerators under increasingly diverse multimodal workloads, optimising inference energy has become as critical as optimizing latency and throughput. Existing approaches either treat latency as an energy proxy or rely on data-hungry black-box surrogates. Both fail under varying parallelism strategies: latency and energy optima diverge in over 20% of configurations we tested, and black-box surrogates require hundreds of profiling samples to generalize across model families and hardware. We present EnergyLens, which uses symbolic regression as a structure-discovery tool over profiling data to derive a single twelve-parameter closed-form energy model expressed in terms of system properties such as degree of parallelism, batch size, and sequence length. Unlike black-box…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
