A Performance Analyzer for a Public Cloud's ML-Augmented VM Allocator
Roozbeh Bostandoost, Pooria Namyar, Siva Kesava Reddy Kakarla, Ryan Beckett, Santiago Segarra, Eli Cortez, Ankur Mallick, Kevin Hsieh, Rodrigo Fonseca, Mohammad Hajiesmaili, Behnaz Arzani

TL;DR
SANJESH is a bi-level optimization tool that systematically stress-tests ML models in cloud VM allocators, revealing adverse interactions and worst-case scenarios not detectable by traditional methods.
Contribution
It introduces a probabilistic adversarial analyzer that uncovers harmful model interactions in cloud VM placement pipelines, improving robustness testing.
Findings
Uncovered scenarios causing 4x worse performance than existing evaluators.
Demonstrated effectiveness on production traces from a cloud operator.
Revealed complex model interactions that degrade cloud performance.
Abstract
Cloud operators increasingly deploy multiple ML models in their VM allocation pipelines. In such settings, individually benign predictions can shift and compound, severely degrading performance. In a cloud provider's VM placement pipeline, CPU, memory, and lifetime prediction models jointly determine server count, live migration frequency, and network utilization; yet no existing approach can systematically stress-test how these models adversely interact. Deterministic adversarial analyzers cannot capture probabilistic ML behavior, so operators miss failures that arise only from correlated distributional shifts across models In SANJESH, we formulate a bi-level optimization that captures how the ML models behave statistically and uncovers how they adversely interact. The outer level searches over what predictions the ML models could produce under distributional uncertainty to find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
