Resource Management for GPT-based Model Deployed on Clouds: Challenges,   Solutions, and Future Directions

Yongkang Dang; Minxian Xu; Kejiang Ye

arXiv:2308.02970·cs.DC·August 8, 2023

Resource Management for GPT-based Model Deployed on Clouds: Challenges, Solutions, and Future Directions

Yongkang Dang, Minxian Xu, Kejiang Ye

PDF

Open Access

TL;DR

This paper discusses the challenges of resource management for GPT-based models in cloud environments, proposing solutions, a framework, scheduling algorithms, and future research directions to improve efficiency and sustainability.

Contribution

It introduces a comprehensive resource management framework and tailored scheduling algorithms specifically designed for GPT-based models in cloud settings.

Findings

01

Identified key resource management challenges for GPT models in clouds.

02

Proposed a new resource management framework and scheduling algorithms.

03

Highlighted future research directions for sustainable GPT deployment.

Abstract

The widespread adoption of the large language model (LLM), e.g. Generative Pre-trained Transformer (GPT), deployed on cloud computing environment (e.g. Azure) has led to a huge increased demand for resources. This surge in demand poses significant challenges to resource management in clouds. This paper aims to highlight these challenges by first identifying the unique characteristics of resource management for the GPT-based model. Building upon this understanding, we analyze the specific challenges faced by resource management in the context of GPT-based model deployed on clouds, and propose corresponding potential solutions. To facilitate effective resource management, we introduce a comprehensive resource management framework and present resource scheduling algorithms specifically designed for the GPT-based model. Furthermore, we delve into the future directions for resource…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing