Android App Feature Extraction: A review of approaches for malware and app similarity detection
Simon Torka, Sahin Albayrak

TL;DR
This review analyzes Android malware and app similarity detection research from 2002 to 2022, emphasizing the need for accessible, comprehensive datasets to improve reproducibility and cross-domain collaboration.
Contribution
It provides a systematic overview of existing approaches, highlights gaps in dataset availability, and offers guidelines and a schematic method for creating a comprehensive dataset.
Findings
Many studies lack dataset publication or description
A need for accessible, well-documented datasets
Guidelines proposed for dataset creation
Abstract
This paper reviews work published between 2002 and 2022 in the fields of Android malware, clone, and similarity detection. It examines the data sources, tools, and features used in existing research and identifies the need for a comprehensive, cross-domain dataset to facilitate interdisciplinary collaboration and the exploitation of synergies between different research areas. Furthermore, it shows that many research papers do not publish the dataset or a description of how it was created, making it difficult to reproduce or compare the results. The paper highlights the necessity for a dataset that is accessible, well-documented, and suitable for a range of applications. Guidelines are provided for this purpose, along with a schematic method for creating the dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Mobile and Web Applications
