Towards Sustainable Large Language Model Serving
Sophia Nguyen, Beihao Zhou, Yi Ding, Sihang Liu

TL;DR
This paper analyzes the carbon emissions of large language model serving, considering operational and embodied emissions, and provides models to optimize sustainability based on performance, energy use, and carbon intensity across different regions.
Contribution
It introduces a comprehensive framework for evaluating and modeling the carbon footprint of LLM serving, including operational and embodied emissions, across different hardware and energy sources.
Findings
Operational and embodied emissions significantly impact LLM sustainability.
Performance and energy consumption vary with hardware and energy grid sources.
Optimizing both emissions can lead to more sustainable LLM deployment.
Abstract
In this work, we study LLMs from a carbon emission perspective, addressing both operational and embodied emissions, and paving the way for sustainable LLM serving. We characterize the performance and energy of LLaMA with 1B, 3B, and 7B parameters using two Nvidia GPU types, a latest-generation RTX6000 Ada and an older-generation T4. We analytically model operational carbon emissions based on energy consumption and carbon intensities from three grid regions -- each representing a different energy source mix, and embodied carbon emissions based on chip area and memory size. Our characterization and modeling provide us with an in-depth understanding of the performance, energy, and carbon emissions of LLM serving. Our findings highlight the potential for optimizing sustainable LLM serving systems by considering both operational and embodied carbon emissions simultaneously.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsLLaMA · Adaptive Discriminator Augmentation
