TL;DR
This study analyzes a vast dataset of serverless cold starts across multiple data centers, identifying key factors influencing cold start times and proposing strategies for mitigation to improve serverless platform performance.
Contribution
It provides the first comprehensive analysis of cold start behaviors using large-scale real-world data and introduces the pod utility ratio as a new metric for understanding cold start impact.
Findings
Cold start times vary significantly across regions and are influenced by trigger types and resource allocations.
Dependency deployment and scheduling are major contributors to cold start duration.
Multi-region scheduling offers opportunities to reduce cold start frequency and duration.
Abstract
This paper releases and analyzes a month-long trace of 85 billion user requests and 11.9 million cold starts from Huawei's serverless cloud platform. Our analysis spans workloads from five data centers. We focus on cold starts and provide a comprehensive examination of the underlying factors influencing the number and duration of cold starts. These factors include trigger types, request synchronicity, runtime languages, and function resource allocations. We investigate components of cold starts, including pod allocation time, code and dependency deployment time, and scheduling delays, and examine their relationships with runtime languages, trigger types, and resource allocation. We introduce pod utility ratio to measure the pod's useful lifetime relative to its cold start time, giving a more complete picture of cold starts, and see that some pods with long cold start times have longer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
