Aging-aware CPU Core Management for Embodied Carbon Amortization in Cloud LLM Inference
Tharindu B. Hewage, Shashikant Ilager, Maria Rodriguez Read, Rajkumar, Buyya

TL;DR
This paper introduces an aging-aware CPU core management method for cloud LLM inference that extends CPU lifespan, reduces embodied carbon emissions by 37.67%, and maintains service quality with minimal impact.
Contribution
It presents a novel technique leveraging CPU underutilization patterns to delay aging effects, enabling longer CPU use and lower embodied carbon in cloud LLM inference clusters.
Findings
37.67% reduction in embodied carbon emissions
77% decrease in CPU underutilization
Less than 10% impact on inference service quality
Abstract
Broad adoption of Large Language Models (LLM) demands rapid expansions of cloud LLM inference clusters, leading to accumulation of embodied carbonthe emissions from manufacturing and supplying IT assetsthat mostly concentrate on inference server CPU. This paper delves into the challenges of sustainable growth of cloud LLM inference, emphasizing extended amortization of CPU embodied over an increased lifespan. Given the reliability risks of silicon aging, we propose an aging-aware CPU core management technique to delay CPU aging effects, allowing the cluster operator to safely increase CPU life. Our technique exploits CPU underutilization patterns that we uncover in cloud LLM inference by halting aging in unused cores and even-outing aging in active cores via selective deep idling and aging-aware inference task allocation. Through extensive simulations using real-world Azure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management
