Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference
Jiaming Cheng, Duong Tung Nguyen

TL;DR
Green-LLM is a multi-objective optimization framework that efficiently allocates LLM inference workloads across edge data centers, balancing cost, environmental impact, and latency.
Contribution
It introduces a novel, real-world constrained optimization model for environmentally-aware workload distribution in distributed LLM inference.
Findings
Reduces carbon emissions and water consumption significantly.
Maintains operational costs within 3% of the minimum.
Ensures sub-2-second response latency.
Abstract
This paper investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers over time. Each data center features on-site renewable generation and faces dynamic electricity prices and spatiotemporal variability in renewable availability. We propose Green-LLM, a lexicographic multi-objective optimization framework that addresses this challenge without requiring manual weight tuning. The proposed model incorporates real-world constraints, including token-dependent processing delay and energy consumption, heterogeneous hardware capabilities, dynamic renewable generation, and spatiotemporal variations in electricity prices and carbon intensity. Unlike existing approaches that optimize individual environmental metrics in isolation, Green-LLM jointly minimizes operational cost, carbon emissions, and delay penalty while enforcing water…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
