TL;DR
This paper introduces MLCDroid, a multi-label classification system for Android malware that identifies multiple malicious behaviors, utilizing active learning and data augmentation to improve accuracy and provide detailed malware analysis.
Contribution
It presents the first multi-label Android malware classification approach that detects multiple malicious behaviors and employs active learning to enhance accuracy with limited labeled data.
Findings
Achieved up to 73.3% effectiveness with algorithm comparison.
Improved accuracy to 86.7% using active learning and data augmentation.
Constructed a labeled dataset of six malicious behaviors from real-world malware.
Abstract
The existing malware classification approaches (i.e., binary and family classification) can barely benefit subsequent analysis with their outputs. Even the family classification approaches suffer from lacking a formal naming standard and an incomplete definition of malicious behaviors. More importantly, the existing approaches are powerless for one malware with multiple malicious behaviors, while this is a very common phenomenon for Android malware in the wild. So, neither of them can provide researchers with a direct and comprehensive enough understanding of malware. In this paper, we propose MLCDroid, an ML-based multi-label classification approach that can directly indicate the existence of pre-defined malicious behaviors. With an in-depth analysis, we summarize six basic malicious behaviors from real-world malware with security reports and construct a labeled dataset. We compare the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
