Towards Designing a Self-Managed Machine Learning Inference Serving System inPublic Cloud
Jashwant Raj Gunasekaran, Prashanth Thinakaran, Cyan Subhra Mishra,, Mahmut Taylan Kandemir, Chita R. Das

TL;DR
This paper explores designing a self-managed ML inference system in public cloud that optimizes for cost, accuracy, and latency by considering resource and model heterogeneity, using reinforcement learning.
Contribution
It characterizes the cost, accuracy, and latency trade-offs of hosting ML inferences on cloud resources and proposes a reinforcement-learning based high-level design for a self-managed inference system.
Findings
Prior work does not address combined model and resource heterogeneity.
A comprehensive evaluation of existing cost-effective prediction-serving methods.
Proposed reinforcement learning approach for adaptive inference management.
Abstract
We are witnessing an increasing trend towardsusing Machine Learning (ML) based prediction systems, span-ning across different application domains, including productrecommendation systems, personal assistant devices, facialrecognition, etc. These applications typically have diverserequirements in terms of accuracy and response latency, thathave a direct impact on the cost of deploying them in a publiccloud. Furthermore, the deployment cost also depends on thetype of resources being procured, which by themselves areheterogeneous in terms of provisioning latencies and billingcomplexity. Thus, it is strenuous for an inference servingsystem to choose from this confounding array of resourcetypes and model types to provide low-latency and cost-effectiveinferences. In this work we quantitatively characterize the cost,accuracy and latency implications of hosting ML inferenceson different public…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Cloud Computing and Resource Management · IoT and Edge/Fog Computing
