Resource Management for GPT-based Model Deployed on Clouds: Challenges, Solutions, and Future Directions
Yongkang Dang, Minxian Xu, Kejiang Ye

TL;DR
This paper discusses the challenges of resource management for GPT-based models in cloud environments, proposing solutions, a framework, scheduling algorithms, and future research directions to improve efficiency and sustainability.
Contribution
It introduces a comprehensive resource management framework and tailored scheduling algorithms specifically designed for GPT-based models in cloud settings.
Findings
Identified key resource management challenges for GPT models in clouds.
Proposed a new resource management framework and scheduling algorithms.
Highlighted future research directions for sustainable GPT deployment.
Abstract
The widespread adoption of the large language model (LLM), e.g. Generative Pre-trained Transformer (GPT), deployed on cloud computing environment (e.g. Azure) has led to a huge increased demand for resources. This surge in demand poses significant challenges to resource management in clouds. This paper aims to highlight these challenges by first identifying the unique characteristics of resource management for the GPT-based model. Building upon this understanding, we analyze the specific challenges faced by resource management in the context of GPT-based model deployed on clouds, and propose corresponding potential solutions. To facilitate effective resource management, we introduce a comprehensive resource management framework and present resource scheduling algorithms specifically designed for the GPT-based model. Furthermore, we delve into the future directions for resource…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing
