An MDL-Based Classifier for Transactional Datasets with Application in Malware Detection
Behzad Asadi, Vijay Varadharajan

TL;DR
This paper introduces an MDL-based classifier for transactional datasets, specifically applied to static malware detection using API call patterns, achieving performance comparable to deep neural networks with added interpretability.
Contribution
The paper presents a novel MDL-based classification method that summarizes patterns efficiently for malware detection, avoiding pattern explosion and providing interpretability.
Findings
Classifier performs close to deep neural networks.
Method effectively summarizes patterns for malware detection.
Classifier offers interpretability advantages.
Abstract
We design a classifier for transactional datasets with application in malware detection. We build the classifier based on the minimum description length (MDL) principle. This involves selecting a model that best compresses the training dataset for each class considering the MDL criterion. To select a model for a dataset, we first use clustering followed by closed frequent pattern mining to extract a subset of closed frequent patterns (CFPs). We show that this method acts as a pattern summarization method to avoid pattern explosion; this is done by giving priority to longer CFPs, and without requiring to extract all CFPs. We then use the MDL criterion to further summarize extracted patterns, and construct a code table of patterns. This code table is considered as the selected model for the compression of the dataset. We evaluate our classifier for the problem of static malware detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Spam and Phishing Detection
MethodsMinimum Description Length
