Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference

Jiaming Cheng; Duong Tung Nguyen

arXiv:2507.09942·cs.NI·April 10, 2026

Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference

Jiaming Cheng, Duong Tung Nguyen

PDF

TL;DR

Green-LLM is a multi-objective optimization framework that efficiently allocates LLM inference workloads across edge data centers, balancing cost, environmental impact, and latency.

Contribution

It introduces a novel, real-world constrained optimization model for environmentally-aware workload distribution in distributed LLM inference.

Findings

01

Reduces carbon emissions and water consumption significantly.

02

Maintains operational costs within 3% of the minimum.

03

Ensures sub-2-second response latency.

Abstract

This paper investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers over time. Each data center features on-site renewable generation and faces dynamic electricity prices and spatiotemporal variability in renewable availability. We propose Green-LLM, a lexicographic multi-objective optimization framework that addresses this challenge without requiring manual weight tuning. The proposed model incorporates real-world constraints, including token-dependent processing delay and energy consumption, heterogeneous hardware capabilities, dynamic renewable generation, and spatiotemporal variations in electricity prices and carbon intensity. Unlike existing approaches that optimize individual environmental metrics in isolation, Green-LLM jointly minimizes operational cost, carbon emissions, and delay penalty while enforcing water…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.