MOPAR: A Model Partitioning Framework for Deep Learning Inference   Services on Serverless Platforms

Jiaang Duan; Shiyou Qian; Dingyu Yang; Hanwen Hu; Jian Cao; Guangtao; Xue

arXiv:2404.02445·cs.DC·April 4, 2024·2 cites

MOPAR: A Model Partitioning Framework for Deep Learning Inference Services on Serverless Platforms

Jiaang Duan, Shiyou Qian, Dingyu Yang, Hanwen Hu, Jian Cao, Guangtao, Xue

PDF

Open Access

TL;DR

MOPAR is a framework that partitions deep learning models to optimize resource use and reduce latency and costs when deploying inference services on serverless platforms.

Contribution

This paper introduces MOPAR, a novel model partitioning framework that leverages resource usage patterns for efficient deployment of DL inference services on serverless platforms.

Findings

01

Resource efficiency improved by 27.62% on average.

02

Latency reduced by about 5.52%.

03

Cost reduced by approximately 2.58 times.

Abstract

With its elastic power and a pay-as-you-go cost model, the deployment of deep learning inference services (DLISs) on serverless platforms is emerging as a prevalent trend. However, the varying resource requirements of different layers in DL models hinder resource utilization and increase costs, when DLISs are deployed as a single function on serverless platforms. To tackle this problem, we propose a model partitioning framework called MOPAR. This work is based on the two resource usage patterns of DLISs: global differences and local similarity, due to the presence of resource dominant (RD) operators and layer stacking. Considering these patterns, MOPAR adopts a hybrid approach that initially divides the DL model vertically into multiple slices composed of similar layers to improve resource efficiency. Slices containing RD operators are further partitioned into multiple sub-slices,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Cloud Computing and Resource Management