FIRED: a fine-grained robust performance diagnosis framework for cloud applications
Ruyue Xin, Hongyun Liu, Peng Chen, Paola Grosso, Zhiming Zhao

TL;DR
FIRED is a framework that improves performance anomaly detection and root cause localization in cloud applications using ensemble deep learning and fine-grained analysis, achieving high accuracy with limited labels.
Contribution
The paper introduces FIRED, a novel framework combining ensemble models and weakly-supervised learning for robust, fine-grained performance diagnosis in cloud environments.
Findings
Achieves anomaly detection F1 score > 0.8 within 4 minutes.
Locates first four root causes with over 70% accuracy.
Outperforms existing models in robustness and re-usability.
Abstract
To run a cloud application with the required service quality, operators have to continuously monitor the cloud application's run-time status, detect potential performance anomalies, and diagnose the root causes of anomalies. However, existing models of performance anomaly detection often suffer from low re-usability and robustness due to the diversity of system-level metrics being monitored and the lack of high-quality labeled monitoring data for anomalies. Moreover, the current coarse-grained analysis models make it difficult to locate system-level root causes of the application performance anomalies for effective adaptation decisions. We provide a FIne-grained Robust pErformance Diagnosis (FIRED) framework to tackle those challenges. The framework offers an ensemble of several well-selected base models for anomaly detection using a deep neural network, which adopts weakly-supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Anomaly Detection Techniques and Applications
