Better Call Graphs: A New Dataset of Function Call Graphs for Malware Classification
Jakir Hossain, Gurvinder Singh, Lukasz Ziarek, Ahmet Erdem Sar{\i}y\"uce

TL;DR
This paper introduces Better Call Graphs, a large, diverse dataset of Android function call graphs designed to improve malware classification and address limitations of existing datasets.
Contribution
The paper presents a new, high-quality Android-specific FCG dataset that enhances malware detection research by providing more representative and diverse data.
Findings
Baseline experiments show improved classification accuracy with BCG
Existing datasets are outdated and limited in diversity
BCG enables more reliable evaluation of malware detection methods
Abstract
Function call graphs (FCGs) have emerged as a powerful abstraction for malware detection, capturing the behavioral structure of applications beyond surface-level signatures. Their utility in traditional program analysis has been well established, enabling effective classification and analysis of malicious software. In the mobile domain, especially in the Android ecosystem, FCG-based malware classification is particularly critical due to the platform's widespread adoption and the complex, component-based structure of Android apps. However, progress in this direction is hindered by the lack of large-scale, high-quality Android-specific FCG datasets. Existing datasets are often outdated, dominated by small or redundant graphs resulting from app repackaging, and fail to reflect the diversity of real-world malware. These limitations lead to overfitting and unreliable evaluation of…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
This seems to be a sound way to filter APKs and featurizing them, especially combining graph-level features with the function call graph features. Definitely is a necessary advancement to research, as malware datasets are currently quite old.
It would be nice to have non-Android (x86) malware. identifying specific changes in APK structures over time: this is often an important part of malware research - how do these features and graph structures change over time? Similarly, how does this do on unseen data (new families that arise)? This dataset relies on existing tools like VirusTotal for malware classification and AVClass for label assignment. While I personally don't think this is necessarily an issue, I do think it's relatively
The authors propose a dataset for function call graphs (FCGs) from Android APKs in the task of malware classifcation. They downloaded more recent APKs, determined family and type, constructed FCGs, and removed graphs with fewer than 100 edges and duplicates. The resulting dataset has 9938 graphs, with an average of 25k nodes and 54k edges. It contains 29 types and 118 families. They extracted non-graph APK features (AF), such as servies, receivers, and libraries. They also extracted graph fe
New algorithmic methods were not proposed and evaluated. More non-graph features could have been extracted, such as n-grams of instructions.
1. A new dataset is proposed. 2. The paper is well organized.
The collection and construction of the dataset lack distinctiveness. The inclusion of new software and non-repetitive samples does not constitute the primary contribution of the paper.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Spam and Phishing Detection · Software Engineering Research
