Collaborative Multi-Agent Reinforcement Learning Approach for Elastic Cloud Resource Scaling

Bruce Fang; Danyi Gao

arXiv:2507.00550·cs.DC·July 2, 2025

Collaborative Multi-Agent Reinforcement Learning Approach for Elastic Cloud Resource Scaling

Bruce Fang, Danyi Gao

PDF

Open Access

TL;DR

This paper introduces a multi-agent reinforcement learning approach for elastic cloud resource scaling, improving responsiveness, resource utilization, and system robustness in dynamic cloud environments.

Contribution

It presents a novel collaborative multi-agent framework with a lightweight prediction model and centralized training for effective cloud resource scaling.

Findings

01

Outperforms existing methods in resource utilization and SLA compliance

02

Enhances responsiveness and scheduling latency in cloud environments

03

Demonstrates strong adaptability and robustness in experiments

Abstract

This paper addresses the challenges of rapid resource variation and highly uncertain task loads in cloud computing environments. It proposes an optimization method for elastic cloud resource scaling based on a multi-agent system. The method deploys multiple autonomous agents to perceive resource states in parallel and make local decisions. While maintaining the distributed nature of the system, it introduces a collaborative value function to achieve global coordination. This improves the responsiveness of resource scheduling and enhances overall system performance. To strengthen system foresight, a lightweight state prediction model is designed. It assists agents in identifying future workload trends and optimizes the selection of scaling actions. For policy training, the method adopts a centralized training and decentralized execution reinforcement learning framework. This enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Big Data and Digital Economy