Ensemble Method for System Failure Detection Using Large-Scale Telemetry Data
Priyanka Mudgal, Rita H. Wouhaybi

TL;DR
This paper introduces an ensemble machine learning approach combining LSTM, isolation forests, OCSVM, and LOF to detect system failures from large-scale telemetry data, improving system reliability.
Contribution
It presents a novel ensemble methodology integrating multiple algorithms for effective failure detection using extensive telemetry data.
Findings
High detection rate of system failures
Effective identification of failure patterns
Enhanced system reliability insights
Abstract
The growing reliance on computer systems, particularly personal computers (PCs), necessitates heightened reliability to uphold user satisfaction. This research paper presents an in-depth analysis of extensive system telemetry data, proposing an ensemble methodology for detecting system failures. Our approach entails scrutinizing various parameters of system metrics, encompassing CPU utilization, memory utilization, disk activity, CPU temperature, and pertinent system metadata such as system age, usage patterns, core count, and processor type. The proposed ensemble technique integrates a diverse set of algorithms, including Long Short-Term Memory (LSTM) networks, isolation forests, one-class support vector machines (OCSVM), and local outlier factors (LOF), to effectively discern system failures. Specifically, the LSTM network with other machine learning techniques is trained on Intel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance · Sparse Evolutionary Training · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
