MARLIN: Multi-Agent Game-Theoretic Reinforcement Learning for Sustainable LLM Inference in Cloud Datacenters

H. Moore; S. Qi; D. Milojicic; C. Bash; S. Pasricha

arXiv:2605.13496·cs.DC·May 14, 2026

MARLIN: Multi-Agent Game-Theoretic Reinforcement Learning for Sustainable LLM Inference in Cloud Datacenters

H. Moore, S. Qi, D. Milojicic, C. Bash, S. Pasricha

PDF

TL;DR

MARLIN is a multi-agent reinforcement learning framework designed to optimize LLM inference in cloud datacenters, significantly reducing latency, carbon footprint, water use, and energy costs.

Contribution

It introduces a novel game-theoretic approach to co-optimize multiple sustainability and performance metrics for LLM inference serving.

Findings

01

At least 18% reduction in time-to-first token

02

33% decrease in carbon emissions

03

43% less water usage

Abstract

Large Language Models (LLMs) have become increasingly prevalent in cloud-based platforms, propelled by the introduction of AI-based consumer and enterprise services. LLM inference requests in particular account for up to 90% of total LLM lifecycle energy use, dwarfing training energy costs. The rising volume of LLM inference requests is increasing environmental footprints, particularly carbon emissions and water consumption. To improve sustainability for LLM inference serving in cloud datacenter environments, we propose a novel multi-agent game-theoretic reinforcement learning framework called MARLIN to co-optimize time-to-first token (TTFT), carbon emissions, water usage, and energy costs associated with LLM inference. MARLIN demonstrates a reduction of at least 18% in TTFT, 33% in carbon emissions, 43% in water usage, and 11% in energy costs compared to state-of-the-art LLM inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.