EcoServe: Designing Carbon-Aware AI Inference Systems

Yueying Li; Zhanqiu Hu; Esha Choukse; Rodrigo Fonseca; G. Edward Suh,; Udit Gupta

arXiv:2502.05043·cs.DC·March 18, 2025·3 cites

EcoServe: Designing Carbon-Aware AI Inference Systems

Yueying Li, Zhanqiu Hu, Esha Choukse, Rodrigo Fonseca, G. Edward Suh,, Udit Gupta

PDF

Open Access

TL;DR

EcoServe is a framework that reduces carbon emissions in AI inference by optimizing resource provisioning and scheduling, leveraging insights from real-world deployment to balance performance and environmental impact.

Contribution

The paper introduces EcoServe, a novel carbon-aware resource management system for LLM inference that significantly reduces emissions while maintaining performance.

Findings

01

EcoServe can cut carbon emissions by up to 47%.

02

GPU operational carbon dominates, but host processing systems contribute most to embodied carbon.

03

Offline batch inference can account for over half of serving capacity.

Abstract

The rapid increase in LLM ubiquity and scale levies unprecedented demands on computing infrastructure. These demands not only incur large compute and memory resources but also significant energy, yielding large operational and embodied carbon emissions. In this work, we present three main observations based on modeling and traces from the production deployment of two Generative AI services in a major cloud service provider. First, while GPUs dominate operational carbon, host processing systems (e.g., CPUs, memory, storage) dominate embodied carbon. Second, offline, batch inference accounts for a significant portion (up to 55\%) of serving capacity. Third, there are different levels of heterogeneity across hardware and workloads for LLM inference. Based on these observations, we design EcoServe, a carbon-aware resource provision and scheduling framework for LLM serving systems. It is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence