Intelligent Resource Scheduling for Co-located Latency-critical Services: A Multi-Model Collaborative Learning Approach
Lei Liu

TL;DR
This paper introduces OSML, a multi-model machine learning-based scheduler that improves resource allocation for co-located latency-critical services by avoiding resource cliffs and enhancing QoS stability.
Contribution
It presents a novel collaborative ML approach that predicts QoS variations and intelligently guides resource scheduling to improve efficiency and stability in cloud environments.
Findings
Supports higher load levels with QoS guarantees
Reduces scheduling overhead and convergence time
Effectively avoids resource cliffs during scheduling
Abstract
Latency-critical services have been widely deployed in cloud environments. For cost-efficiency, multiple services are usually co-located on a server. Thus, run-time resource scheduling becomes the pivot for QoS control in these complicated co-location cases. However, the scheduling exploration space enlarges rapidly with the increasing server resources, making the schedulers hardly provide ideal solutions quickly. More importantly, we observe that there are "resource cliffs" in the scheduling exploration space. They affect the exploration efficiency and always lead to severe QoS fluctuations. Resource cliffs cannot be easily avoided in previous schedulers. To address these problems, we propose a novel ML-based intelligent scheduler - OSML. It learns the correlation between architectural hints (e.g., IPC, cache misses, memory footprint, etc.), scheduling solutions and the QoS demands…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Software System Performance and Reliability
