KAIROS: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources
Baolin Li, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari

TL;DR
KAIROS is a runtime framework that enhances cloud-based machine learning inference by efficiently utilizing heterogeneous hardware to double throughput and outperform existing solutions under QoS and cost constraints.
Contribution
KAIROS introduces a novel approach to leverage heterogeneous cloud resources without online exploration, optimizing inference query distribution for improved throughput and cost efficiency.
Findings
Up to 2X throughput compared to homogeneous solutions
Outperforms state-of-the-art schemes by up to 70%
Effective utilization of heterogeneous hardware without exploration overhead
Abstract
Online inference is becoming a key service product for many businesses, deployed in cloud platforms to meet customer demands. Despite their revenue-generation capability, these services need to operate under tight Quality-of-Service (QoS) and cost budget constraints. This paper introduces KAIROS, a novel runtime framework that maximizes the query throughput while meeting QoS target and a cost budget. KAIROS designs and implements novel techniques to build a pool of heterogeneous compute hardware without online exploration overhead, and distribute inference queries optimally at runtime. Our evaluation using industry-grade deep learning (DL) models shows that KAIROS yields up to 2X the throughput of an optimal homogeneous solution, and outperforms state-of-the-art schemes by up to 70%, despite advantageous implementations of the competing schemes to ignore their exploration overhead.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Algorithms · Cloud Computing and Resource Management
Methodstravel james
