HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
Jiabin Chen, Fei Xu, Yikun Gu, Li Chen, Fangming Liu, Zhi Zhou

TL;DR
HarmonyBatch is a framework that efficiently batches multi-SLO DNN inference requests on heterogeneous serverless functions, reducing costs and ensuring predictable performance across diverse applications.
Contribution
It introduces a performance and cost model for CPU and GPU serverless DNN inference, and a two-stage batching strategy for multi-SLO requests with heterogeneous functions.
Findings
Reduces serverless DNN inference costs by up to 82.9%.
Guarantees predictable performance for multi-SLO applications.
Effective batching strategy for heterogeneous CPU and GPU functions.
Abstract
Deep Neural Network (DNN) inference on serverless functions is gaining prominence due to its potential for substantial budget savings. Existing works on serverless DNN inference solely optimize batching requests from one application with a single Service Level Objective (SLO) on CPU functions. However, production serverless DNN inference traces indicate that the request arrival rate of applications is surprisingly low, which inevitably causes a long batching time and SLO violations. Hence, there is an urgent need for batching multiple DNN inference requests with diverse SLOs (i.e., multi-SLO DNN inference) in serverless platforms. Moreover, the potential performance and cost benefits of deploying heterogeneous (i.e., CPU and GPU) functions for DNN inference have received scant attention. In this paper, we present HarmonyBatch, a cost-efficient resource provisioning framework designed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Privacy-Preserving Technologies in Data
