TL;DR
This paper introduces a deep learning-based clustering method to analyze cloud system failures, reducing manual effort and improving the accuracy of failure mode identification in complex, noisy data.
Contribution
It presents a novel application of Deep Embedded Clustering to cloud failure data, eliminating the need for manual feature engineering and enhancing failure analysis.
Findings
Deep Embedded Clustering achieves comparable or better cluster purity than manual methods.
The approach reduces the need for domain expertise in failure analysis.
Failure mode distribution aligns more closely with actual frequencies.
Abstract
Identifying the failure modes of cloud computing systems is a difficult and time-consuming task, due to the growing complexity of such systems, and the large volume and noisiness of failure data. This paper presents a novel approach for analyzing failure data from cloud systems, in order to relieve human analysts from manually fine-tuning the data for feature engineering. The approach leverages Deep Embedded Clustering (DEC), a family of unsupervised clustering algorithms based on deep learning, which uses an autoencoder to optimize data dimensionality and inter-cluster variance. We applied the approach in the context of the OpenStack cloud computing platform, both on the raw failure data and in combination with an anomaly detection pre-processing algorithm. The results show that the performance of the proposed approach, in terms of purity of clusters, is comparable to, or in some cases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
