ClusterLog: Clustering Logs for Effective Log-based Anomaly Detection
Chris Egersdoerfer, Dong Dai, Di Zhang

TL;DR
ClusterLog is a novel log pre-processing method that clusters semantically similar logs to improve anomaly detection in HPC file systems, addressing challenges of irregularity and ambiguity in log sequences.
Contribution
This paper introduces ClusterLog, a new clustering-based log pre-processing technique that enhances log sequence representation for anomaly detection in parallel file systems.
Findings
Reduces log sequence granularity without losing important information
Improves anomaly detection effectiveness in HPC file systems
Demonstrates generalizability across different log datasets
Abstract
With the increasing prevalence of scalable file systems in the context of High Performance Computing (HPC), the importance of accurate anomaly detection on runtime logs is increasing. But as it currently stands, many state-of-the-art methods for log-based anomaly detection, such as DeepLog, have encountered numerous challenges when applied to logs from many parallel file systems (PFSes), often due to their irregularity and ambiguity in time-based log sequences. To circumvent these problems, this study proposes ClusterLog, a log pre-processing method that clusters the temporal sequence of log keys based on their semantic similarity. By grouping semantically and sentimentally similar logs, this approach aims to represent log sequences with the smallest amount of unique log keys, intending to improve the ability of a downstream sequence-based model to effectively learn the log patterns.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection
