Parallel Instance Filtering for Malware Detection

Martin Jure\v{c}ek; Olha Jure\v{c}kov\'a

arXiv:2206.13889·cs.CR·June 29, 2022

Parallel Instance Filtering for Malware Detection

Martin Jure\v{c}ek, Olha Jure\v{c}kov\'a

PDF

Open Access

TL;DR

This paper introduces Parallel Instance Filtering (PIF), a novel parallel algorithm for selecting representative instances from large malware datasets, significantly reducing data size with minimal impact on classification accuracy.

Contribution

The paper presents a new parallel instance filtering algorithm that efficiently reduces training data size for malware detection without sacrificing accuracy.

Findings

01

PIF significantly reduces training data size.

02

PIF outperforms existing methods in accuracy-to-storage ratio.

03

Parallel processing speeds up instance filtering.

Abstract

Machine learning algorithms are widely used in the area of malware detection. With the growth of sample amounts, training of classification algorithms becomes more and more expensive. In addition, training data sets may contain redundant or noisy instances. The problem to be solved is how to select representative instances from large training data sets without reducing the accuracy. This work presents a new parallel instance selection algorithm called Parallel Instance Filtering (PIF). The main idea of the algorithm is to split the data set into non-overlapping subsets of instances covering the whole data set and apply a filtering process for each subset. Each subset consists of instances that have the same nearest enemy. As a result, the PIF algorithm is fast since subsets are processed independently of each other using parallel computation. We compare the PIF algorithm with several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Advanced Malware Detection Techniques