RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Baolin Li, Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, Karen, Gettings, Devesh Tiwari

TL;DR
RIBBON is a system that optimizes deep learning inference by intelligently selecting diverse cloud instances, achieving better QoS and cost savings compared to homogeneous approaches.
Contribution
It introduces a Bayesian Optimization-based strategy for selecting heterogeneous cloud instances to improve inference cost-effectiveness and QoS.
Findings
RIBBON reduces inference costs by up to 16%.
It outperforms existing homogeneous instance pool methods.
Effective for various deep learning models including recommender systems and drug discovery.
Abstract
Deep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces RIBBON, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind RIBBON is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. RIBBON devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms -- and, RIBBON demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. RIBBON saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
