Topological Data Analysis for Anomaly Detection in Host-Based Logs

Thomas Davies

arXiv:2204.12919·cs.LG·April 28, 2022

Topological Data Analysis for Anomaly Detection in Host-Based Logs

Thomas Davies

PDF

Open Access

TL;DR

This paper explores the use of Topological Data Analysis (TDA) to detect anomalies in host-based logs, demonstrating that topological and spectral features provide valuable, complementary information for classification and explainability.

Contribution

It introduces a novel approach to construct simplicial complexes from Windows logs for TDA-based anomaly detection, comparing topological and spectral features with standard log embeddings.

Findings

01

Topological and spectral embeddings improve anomaly classification.

02

TDA features are complementary to standard log embeddings.

03

Potential for explainable anomaly detection frameworks.

Abstract

Topological Data Analysis (TDA) gives practioners the ability to analyse the global structure of cybersecurity data. We use TDA for anomaly detection in host-based logs collected with the open-source Logging Made Easy (LME) project. We present an approach that builds a filtration of simplicial complexes directly from Windows logs, enabling analysis of their intrinsic structure using topological tools. We compare the efficacy of persistent homology and the spectrum of graph and hypergraph Laplacians as feature vectors against a standard log embedding that counts events, and find that topological and spectral embeddings of computer logs contain discriminative information for classifying anomalous logs that is complementary to standard embeddings. We end by discussing the potential for our methods to be used as part of an explainable framework for anomaly detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopological and Geometric Data Analysis · Anomaly Detection Techniques and Applications