K-Metamodes: frequency- and ensemble-based distributed k-modes   clustering for security analytics

Andrey Sapegin; Christoph Meinel

arXiv:1909.13721·cs.LG·October 1, 2019

K-Metamodes: frequency- and ensemble-based distributed k-modes clustering for security analytics

Andrey Sapegin, Christoph Meinel

PDF

1 Repo

TL;DR

This paper introduces K-Metamodes, a novel distributed clustering algorithm for heterogeneous security data that directly handles mixed numerical and categorical attributes, improving efficiency and effectiveness in intrusion detection tasks.

Contribution

The paper proposes a new frequency-based distance function and adapts k-modes for distributed processing of mixed data, enhancing security analytics.

Findings

01

Higher clustering effectiveness on security datasets

02

Efficient handling of mixed numerical and categorical data

03

Improved performance over previous methods

Abstract

Nowadays processing of Big Security Data, such as log messages, is commonly used for intrusion detection purposed. Its heterogeneous nature, as well as combination of numerical and categorical attributes does not allow to apply the existing data mining methods directly on the data without feature preprocessing. Therefore, a rather computationally expensive conversion of categorical attributes into vector space should be utilised for analysis of such data. However, a well-known k-modes algorithm allows to cluster the categorical data directly and avoid conversion into the vector space. The existing implementations of k-modes for Big Data processing are ensemble-based and utilise two-step clustering, where data subsets are first clustered independently, whereas the resulting cluster modes are clustered again in order to calculate metamodes valid for all data subsets. In this paper, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

asapegin/pyspark-kmetamodes
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.