Challenges and Solutions to Build a Data Pipeline to Identify Anomalies in Enterprise System Performance
Xiaobo Huang, Amitabha Banerjee, Chien-Chia Chen, Chengzhi Huang, Tzu, Yi Chuang, Abhishek Srivastava, Razvan Cheveresan

TL;DR
This paper discusses how VMware addresses data challenges like label scarcity and data drift to improve the accuracy and stability of ML-based anomaly detection in enterprise data centers.
Contribution
The paper presents solutions to data challenges in deploying anomaly detection systems, resulting in a 30% accuracy improvement and sustained model performance over time.
Findings
30% increase in anomaly detection accuracy
Model performance remains stable over time
Successful deployment in production environment
Abstract
We discuss how VMware is solving the following challenges to harness data to operate our ML-based anomaly detection system to detect performance issues in our Software Defined Data Center (SDDC) enterprise deployments: (i) label scarcity and label bias due to heavy dependency on unscalable human annotators, and (ii) data drifts due to ever-changing workload patterns, software stack and underlying hardware. Our anomaly detection system has been deployed in production for many years and has successfully detected numerous major performance issues. We demonstrate that by addressing these data challenges, we not only improve the accuracy of our performance anomaly detection model by 30%, but also ensure that the model performance to never degrade over time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection
