Modeling Anomaly Detection in Cloud Services: Analysis of the Properties that Impact Latency and Resource Consumption
Gabriel Job Antunes Grabher (KRAKOS), Fumio Machida, Thomas Ropars (KRAKOS)

TL;DR
This paper models how different properties of anomaly detectors in cloud services affect latency and resource use, providing insights into optimizing detection strategies for better performance-cost balance.
Contribution
It introduces a stochastic model to analyze the impact of detector precision, recall, and frequency on cloud service performance and resource consumption.
Findings
High precision with frequent detection yields good performance-cost balance.
Infrequent detection makes recall more critical for performance.
Optimal detection parameters depend on detection frequency.
Abstract
Detecting and resolving performance anomalies in Cloud services is crucial for maintaining desired performance objectives. Scaling actions triggered by an anomaly detector help achieve target latency at the cost of extra resource consumption. However, performance anomaly detectors make mistakes. This paper studies which characteristics of performance anomaly detection are important to optimize the trade-off between performance and cost. Using Stochastic Reward Nets, we model a Cloud service monitored by a performance anomaly detector. Using our model, we study the impact of detector characteristics, namely precision, recall and inspection frequency, on the average latency and resource consumption of the monitored service. Our results show that achieving a high precision and a high recall is not always necessary. If detection can be run frequently, a high precision is enough to obtain a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Network Security and Intrusion Detection
