Better Call Graphs: A New Dataset of Function Call Graphs for Malware Classification

Jakir Hossain; Gurvinder Singh; Lukasz Ziarek; Ahmet Erdem Sar{\i}y\"uce

arXiv:2512.20872·cs.CR·December 25, 2025

Better Call Graphs: A New Dataset of Function Call Graphs for Malware Classification

Jakir Hossain, Gurvinder Singh, Lukasz Ziarek, Ahmet Erdem Sar{\i}y\"uce

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Better Call Graphs, a large, diverse dataset of Android function call graphs designed to improve malware classification and address limitations of existing datasets.

Contribution

The paper presents a new, high-quality Android-specific FCG dataset that enhances malware detection research by providing more representative and diverse data.

Findings

01

Baseline experiments show improved classification accuracy with BCG

02

Existing datasets are outdated and limited in diversity

03

BCG enables more reliable evaluation of malware detection methods

Abstract

Function call graphs (FCGs) have emerged as a powerful abstraction for malware detection, capturing the behavioral structure of applications beyond surface-level signatures. Their utility in traditional program analysis has been well established, enabling effective classification and analysis of malicious software. In the mobile domain, especially in the Android ecosystem, FCG-based malware classification is particularly critical due to the platform's widespread adoption and the complex, component-based structure of Android apps. However, progress in this direction is hindered by the lack of large-scale, high-quality Android-specific FCG datasets. Existing datasets are often outdated, dominated by small or redundant graphs resulting from app repackaging, and fail to reflect the diversity of real-world malware. These limitations lead to overfitting and unreliable evaluation of…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 8Confidence 4

Strengths

This seems to be a sound way to filter APKs and featurizing them, especially combining graph-level features with the function call graph features. Definitely is a necessary advancement to research, as malware datasets are currently quite old.

Weaknesses

It would be nice to have non-Android (x86) malware. identifying specific changes in APK structures over time: this is often an important part of malware research - how do these features and graph structures change over time? Similarly, how does this do on unseen data (new families that arise)? This dataset relies on existing tools like VirusTotal for malware classification and AVClass for label assignment. While I personally don't think this is necessarily an issue, I do think it's relatively

Reviewer 02Rating 5Confidence 4

Strengths

The authors propose a dataset for function call graphs (FCGs) from Android APKs in the task of malware classifcation. They downloaded more recent APKs, determined family and type, constructed FCGs, and removed graphs with fewer than 100 edges and duplicates. The resulting dataset has 9938 graphs, with an average of 25k nodes and 54k edges. It contains 29 types and 118 families. They extracted non-graph APK features (AF), such as servies, receivers, and libraries. They also extracted graph fe

Weaknesses

New algorithmic methods were not proposed and evaluated. More non-graph features could have been extracted, such as n-grams of instructions.

Reviewer 03Rating 5Confidence 4

Strengths

1. A new dataset is proposed. 2. The paper is well organized.

Weaknesses

The collection and construction of the dataset lack distinctiveness. The inclusion of new software and non-repetitive samples does not constitute the primary contribution of the paper.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Spam and Phishing Detection · Software Engineering Research