A Tale of Two Scales: Reconciling Horizontal and Vertical Scaling for Inference Serving Systems
Kamran Razavi, Mehran Salmani, Max M\"uhlh\"auser, Boris Koldehofe,, Lin Wang

TL;DR
This paper presents Themis, a hybrid autoscaling system for inference serving that combines horizontal and vertical scaling strategies to improve performance and resource efficiency under varying workloads.
Contribution
Themis introduces a two-stage autoscaling approach that dynamically switches between vertical and horizontal scaling based on workload conditions, optimizing inference system performance.
Findings
Achieves over 10x reduction in SLO violations compared to existing methods.
Effectively balances resource efficiency and performance under real-world workloads.
Demonstrates significant improvements in inference serving systems through extensive evaluations.
Abstract
Inference serving is of great importance in deploying machine learning models in real-world applications, ensuring efficient processing and quick responses to inference requests. However, managing resources in these systems poses significant challenges, particularly in maintaining performance under varying and unpredictable workloads. Two primary scaling strategies, horizontal and vertical scaling, offer different advantages and limitations. Horizontal scaling adds more instances to handle increased loads but can suffer from cold start issues and increased management complexity. Vertical scaling boosts the capacity of existing instances, allowing for quicker responses but is limited by hardware and model parallelization capabilities. This paper introduces Themis, a system designed to leverage the benefits of both horizontal and vertical scaling in inference serving systems. Themis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Context-Aware Activity Recognition Systems · Semantic Web and Ontologies
