Categorical anomaly detection in heterogeneous data using minimum description length clustering
James Cheney, Xavier Gombau, Ghita Berrada, Sidahmed, Benabderrahmane

TL;DR
This paper introduces a meta-algorithm that enhances MDL-based categorical anomaly detection methods to effectively identify anomalies in heterogeneous datasets by fitting mixture models, improving detection performance in complex security scenarios.
Contribution
It proposes a novel meta-algorithm that extends MDL-based anomaly detection to heterogeneous data using mixture models and clustering, demonstrating improved effectiveness.
Findings
Competitive performance with existing methods on synthetic data
Further gains with sophisticated mixture models on real security data
Effective detection in heterogeneous, multi-source datasets
Abstract
Fast and effective unsupervised anomaly detection algorithms have been proposed for categorical data based on the minimum description length (MDL) principle. However, they can be ineffective when detecting anomalies in heterogeneous datasets representing a mixture of different sources, such as security scenarios in which system and user processes have distinct behavior patterns. We propose a meta-algorithm for enhancing any MDL-based anomaly detection model to deal with heterogeneous data by fitting a mixture model to the data, via a variant of k-means clustering. Our experimental results show that using a discrete mixture model provides competitive performance relative to two previous anomaly detection algorithms, while mixtures of more sophisticated models yield further gains, on both synthetic datasets and realistic datasets from a security scenario.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Time Series Analysis and Forecasting
