Practical Scheduling for Real-World Serverless Computing
Kostis Kaffes, Neeraja J. Yadwadkar, Christos Kozyrakis

TL;DR
This paper introduces Hermes, a novel scheduler for serverless computing that optimizes function execution and load balancing, significantly reducing delays and increasing throughput based on real-world trace analysis.
Contribution
The paper presents a first-principles design of a serverless scheduler, Hermes, incorporating insights from real-world traces to improve performance over existing policies.
Findings
Hermes reduces function slowdown by up to 85%.
Hermes achieves 60% higher throughput.
It effectively minimizes cold starts and head-of-line blocking.
Abstract
Serverless computing has seen rapid growth due to the ease-of-use and cost-efficiency it provides. However, function scheduling, a critical component of serverless systems, has been overlooked. In this paper, we take a first-principles approach toward designing a scheduler that caters to the unique characteristics of serverless functions as seen in real-world deployments. We first create a taxonomy of scheduling policies along three dimensions. Next, we use simulation to explore the scheduling policy space for the function characteristics in a 14-day trace of Azure functions and conclude that frequently used features such as late binding and random load balancing are sub-optimal for common execution time distributions and load ranges. We use these insights to design Hermes, a scheduler for serverless functions with three key characteristics. First, to avoid head-of-line blocking due to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Distributed and Parallel Computing Systems
