Parallel Instance Filtering for Malware Detection
Martin Jure\v{c}ek, Olha Jure\v{c}kov\'a

TL;DR
This paper introduces Parallel Instance Filtering (PIF), a novel parallel algorithm for selecting representative instances from large malware datasets, significantly reducing data size with minimal impact on classification accuracy.
Contribution
The paper presents a new parallel instance filtering algorithm that efficiently reduces training data size for malware detection without sacrificing accuracy.
Findings
PIF significantly reduces training data size.
PIF outperforms existing methods in accuracy-to-storage ratio.
Parallel processing speeds up instance filtering.
Abstract
Machine learning algorithms are widely used in the area of malware detection. With the growth of sample amounts, training of classification algorithms becomes more and more expensive. In addition, training data sets may contain redundant or noisy instances. The problem to be solved is how to select representative instances from large training data sets without reducing the accuracy. This work presents a new parallel instance selection algorithm called Parallel Instance Filtering (PIF). The main idea of the algorithm is to split the data set into non-overlapping subsets of instances covering the whole data set and apply a filtering process for each subset. Each subset consists of instances that have the same nearest enemy. As a result, the PIF algorithm is fast since subsets are processed independently of each other using parallel computation. We compare the PIF algorithm with several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Advanced Malware Detection Techniques
