On Impact of Semantically Similar Apps in Android Malware Datasets
Roopak Surendran

TL;DR
This paper investigates how semantically similar Android malware apps affect machine learning detection performance and proposes a clustering method to identify and remove such apps for more accurate evaluation.
Contribution
It introduces a novel opcode subsequence clustering algorithm to identify semantically similar malware and demonstrates its impact on ML model performance in malware detection.
Findings
Detection rates drop after removing duplicates, indicating influence of similar apps.
Clustering helps in more accurate evaluation of malware detection models.
Proposes a method to improve ML evaluation by eliminating semantically similar apps.
Abstract
Malware authors reuse the same program segments found in other applications for performing the similar kind of malicious activities such as information stealing, sending SMS and so on. Hence, there may exist several semantically similar malware samples in a family/dataset. Many researchers unaware about these semantically similar apps and use their features in their ML models for evaluation. Hence, the performance measures might be seriously affected by these similar kinds of apps. In this paper, we study the impact of semantically similar applications in the performance measures of ML based Android malware detectors. For this, we propose a novel opcode subsequence based malware clustering algorithm to identify the semantically similar malware and goodware apps. For studying the impact of semantically similar apps in the performance measures, we tested the performance of distinct ML…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Mobile and Web Applications
