Synthetic Data: AI's New Weapon Against Android Malware
Angelo Gaspar Diniz Nogueira, Kayua Oleques Paim, Hendrio Bragan\c{c}a, Rodrigo Brand\~ao Mansilha, Diego Kreutz

TL;DR
This paper introduces MalSynGen, a cGAN-based method for generating high-quality synthetic Android malware data to enhance classifier performance and address data scarcity issues.
Contribution
It presents a novel synthetic data generation approach using cGANs specifically designed for Android malware detection datasets.
Findings
Synthetic data improves malware classifier accuracy
MalSynGen generalizes across multiple datasets
Generated data maintains statistical fidelity
Abstract
The ever-increasing number of Android devices and the accelerated evolution of malware, reaching over 35 million samples by 2024, highlight the critical importance of effective detection methods. Attackers are now using Artificial Intelligence to create sophisticated malware variations that can easily evade traditional detection techniques. Although machine learning has shown promise in malware classification, its success relies heavily on the availability of up-to-date, high-quality datasets. The scarcity and high cost of obtaining and labeling real malware samples presents significant challenges in developing robust detection models. In this paper, we propose MalSynGen, a Malware Synthetic Data Generation methodology that uses a conditional Generative Adversarial Network (cGAN) to generate synthetic tabular data. This data preserves the statistical properties of real-world data and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Network Security and Intrusion Detection
