Failure Analysis of Big Cloud Service Providers Prior to and During Covid-19 Period
Muhammad Ahsan, Sacheendra Talluri, Alexandru Iosup

TL;DR
This paper analyzes failures in major cloud service providers before and during Covid-19, highlighting the importance of understanding failure causes to improve cloud reliability during critical periods.
Contribution
It provides a detailed failure analysis based on vendor data, focusing on Covid-19 period failures, which is a novel approach compared to prior studies relying on news sources.
Findings
Identification of failure patterns during Covid-19
Insights into causes of high-severity cloud failures
Recommendations for improving cloud resilience
Abstract
Cloud services are important for societal function such as healthcare, commerce, entertainment and education. Cloud can provide a variety of features such as increased collaboration and inexpensive computing. Failures are unavoidable in cloud services due to the large size and complexity, resulting in decreased reliability and efficiency. For example, due to bugs, many high-severity failures have been occurring in cloud infrastructure of popular providers, causing outages of several hours and the unrecoverable loss of user data. There are prior studies about cloud failure analyses are limited and use sources such as news articles. However, a detailed cloud failure focused study is required that provides analyses for cloud failure data gathered directly from the vendors. Furthermore, the Covid-19 cloud failures should be studied as cloud services played a major role throughout the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Data Security Solutions · Cloud Computing and Resource Management · IoT and Edge/Fog Computing
