Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving
Yuyang Tian, Desen Sun, Yi Ding, Sihang Liu

TL;DR
GreenCache is a dynamic, carbon-aware caching framework for LLM serving that balances environmental impact and performance, reducing carbon emissions by up to 25% while maintaining latency.
Contribution
It introduces GreenCache, a novel cache management system that optimizes resource allocation for LLMs considering both carbon emissions and service level objectives.
Findings
GreenCache reduces carbon emissions by 15.1% on average.
It achieves up to 25.3% carbon reduction in real workloads.
Maintains latency constraints for over 90% of requests.
Abstract
As large language models (LLMs) become widely used, their environmental impact, especially carbon emission, has attracted more attention. Prior studies focus on compute-related carbon emissions. In this paper, we find that storage is another key contributor. LLM caching, which saves and reuses KV caches for repeated context, reduces operational carbon by avoiding redundant computation. However, this benefit comes at the cost of embodied carbon from high-capacity, high-speed SSDs. As LLMs scale, the embodied carbon of storage grows significantly. To address this tradeoff, we present GreenCache, a carbon-aware cache management framework that dynamically derives resource allocation plans for LLM serving. GreenCache analyzes the correlation between carbon emission and SLO satisfaction, reconfiguring the resource over time to keep the balance between SLO and carbon emission under dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
