Data Drift Monitoring for Log Anomaly Detection Pipelines

Dipak Wani; Samuel Ackerman; Eitan Farchi; Xiaotong Liu; Hau-wen; Chang; Sarasi Lalithsena

arXiv:2310.14893·cs.LG·October 24, 2023·2 cites

Data Drift Monitoring for Log Anomaly Detection Pipelines

Dipak Wani, Samuel Ackerman, Eitan Farchi, Xiaotong Liu, Hau-wen, Chang, Sarasi Lalithsena

PDF

Open Access

TL;DR

This paper presents a Bayesian drift detection method for Log Anomaly Detection pipelines that helps identify when models need updating due to changing log patterns, improving system reliability.

Contribution

It introduces a novel Bayes Factor-based approach for detecting log pattern changes, aiding timely model updates in LAD systems.

Findings

01

Effective detection of log pattern changes in real and simulated data

02

Identifies when retraining of LAD models is necessary

03

Supports human-in-the-loop decision making

Abstract

Logs enable the monitoring of infrastructure status and the performance of associated applications. Logs are also invaluable for diagnosing the root causes of any problems that may arise. Log Anomaly Detection (LAD) pipelines automate the detection of anomalies in logs, providing assistance to site reliability engineers (SREs) in system diagnosis. Log patterns change over time, necessitating updates to the LAD model defining the `normal' log activity profile. In this paper, we introduce a Bayes Factor-based drift detection method that identifies when intervention, retraining, and updating of the LAD model are required with human involvement. We illustrate our method using sequences of log activity, both from unaltered data, and simulated activity with controlled levels of anomaly contamination, based on real collected log data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection