RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using   a Diverse Pool of Cloud Computing Instances

Baolin Li; Rohan Basu Roy; Tirthak Patel; Vijay Gadepally; Karen; Gettings; Devesh Tiwari

arXiv:2207.11434·cs.DC·July 29, 2022

RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances

Baolin Li, Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, Karen, Gettings, Devesh Tiwari

PDF

TL;DR

RIBBON is a system that optimizes deep learning inference by intelligently selecting diverse cloud instances, achieving better QoS and cost savings compared to homogeneous approaches.

Contribution

It introduces a Bayesian Optimization-based strategy for selecting heterogeneous cloud instances to improve inference cost-effectiveness and QoS.

Findings

01

RIBBON reduces inference costs by up to 16%.

02

It outperforms existing homogeneous instance pool methods.

03

Effective for various deep learning models including recommender systems and drug discovery.

Abstract

Deep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces RIBBON, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind RIBBON is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. RIBBON devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms -- and, RIBBON demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. RIBBON saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodstravel james