EcoServe: Designing Carbon-Aware AI Inference Systems
Yueying Li, Zhanqiu Hu, Esha Choukse, Rodrigo Fonseca, G. Edward Suh,, Udit Gupta

TL;DR
EcoServe is a framework that reduces carbon emissions in AI inference by optimizing resource provisioning and scheduling, leveraging insights from real-world deployment to balance performance and environmental impact.
Contribution
The paper introduces EcoServe, a novel carbon-aware resource management system for LLM inference that significantly reduces emissions while maintaining performance.
Findings
EcoServe can cut carbon emissions by up to 47%.
GPU operational carbon dominates, but host processing systems contribute most to embodied carbon.
Offline batch inference can account for over half of serving capacity.
Abstract
The rapid increase in LLM ubiquity and scale levies unprecedented demands on computing infrastructure. These demands not only incur large compute and memory resources but also significant energy, yielding large operational and embodied carbon emissions. In this work, we present three main observations based on modeling and traces from the production deployment of two Generative AI services in a major cloud service provider. First, while GPUs dominate operational carbon, host processing systems (e.g., CPUs, memory, storage) dominate embodied carbon. Second, offline, batch inference accounts for a significant portion (up to 55\%) of serving capacity. Third, there are different levels of heterogeneity across hardware and workloads for LLM inference. Based on these observations, we design EcoServe, a carbon-aware resource provision and scheduling framework for LLM serving systems. It is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Business Intelligence
