Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures
Yongji Wu, Matthew Lentz, Danyang Zhuo, Yao Lu

TL;DR
JellyBean is a system that optimizes machine learning inference workflows across heterogeneous infrastructures, reducing costs and improving efficiency by selecting appropriate models and deployment strategies based on service objectives.
Contribution
The paper introduces JellyBean, a novel system that effectively manages ML inference across diverse infrastructure tiers, considering model tradeoffs and deployment costs.
Findings
Reduces visual question answering serving costs by up to 58%.
Decreases vehicle tracking costs by up to 36%.
Outperforms prior ML serving systems by up to 5x in cost efficiency.
Abstract
With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network. Existing machine learning inference platforms typically assume a homogeneous infrastructure and do not take into account the more complex and tiered computing infrastructure that includes edge devices, local hubs, edge datacenters, and cloud datacenters. On the other hand, recent AutoML efforts have provided viable solutions for model compression, pruning and quantization for heterogeneous environments; for a machine learning model, now we may easily find or even generate a series of models with different tradeoffs between accuracy and efficiency. We design and implement JellyBean, a system for serving and optimizing machine learning inference workflows on heterogeneous infrastructures. Given service-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning and Data Classification · IoT and Edge/Fog Computing
MethodsPruning
