Towards Sustainable Large Language Model Serving

Sophia Nguyen; Beihao Zhou; Yi Ding; Sihang Liu

arXiv:2501.01990·cs.LG·January 7, 2025

Towards Sustainable Large Language Model Serving

Sophia Nguyen, Beihao Zhou, Yi Ding, Sihang Liu

PDF

Open Access

TL;DR

This paper analyzes the carbon emissions of large language model serving, considering operational and embodied emissions, and provides models to optimize sustainability based on performance, energy use, and carbon intensity across different regions.

Contribution

It introduces a comprehensive framework for evaluating and modeling the carbon footprint of LLM serving, including operational and embodied emissions, across different hardware and energy sources.

Findings

01

Operational and embodied emissions significantly impact LLM sustainability.

02

Performance and energy consumption vary with hardware and energy grid sources.

03

Optimizing both emissions can lead to more sustainable LLM deployment.

Abstract

In this work, we study LLMs from a carbon emission perspective, addressing both operational and embodied emissions, and paving the way for sustainable LLM serving. We characterize the performance and energy of LLaMA with 1B, 3B, and 7B parameters using two Nvidia GPU types, a latest-generation RTX6000 Ada and an older-generation T4. We analytically model operational carbon emissions based on energy consumption and carbon intensities from three grid regions -- each representing a different energy source mix, and embodied carbon emissions based on chip area and memory size. Our characterization and modeling provide us with an in-depth understanding of the performance, energy, and carbon emissions of LLM serving. Our findings highlight the potential for optimizing sustainable LLM serving systems by considering both operational and embodied carbon emissions simultaneously.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsLLaMA · Adaptive Discriminator Augmentation