Scaling Up: Revisiting Mining Android Sandboxes at Scale for Malware Classification

Francisco Costa; Ismael Medeiros; Leandro Oliveira; Jo\~ao Cal\'assio; Rodrigo Bonif\'acio; Krishna Narasimhan; Mira Mezini; M\'arcio Ribeiro

arXiv:2505.09501·cs.CR·May 15, 2025

Scaling Up: Revisiting Mining Android Sandboxes at Scale for Malware Classification

Francisco Costa, Ismael Medeiros, Leandro Oliveira, Jo\~ao Cal\'assio, Rodrigo Bonif\'acio, Krishna Narasimhan, Mira Mezini, M\'arcio Ribeiro

PDF

1 Repo

TL;DR

This study reevaluates the effectiveness of the Mining Android Sandbox (MAS) approach for malware detection using a significantly larger dataset, revealing its limitations and the need for complementary techniques.

Contribution

It provides a large-scale replication study of the MAS approach, demonstrating its reduced performance on diverse, extensive datasets compared to prior small-scale evaluations.

Findings

01

MAS approach's F1-score drops from 0.90 to 0.54 with larger dataset

02

Certain malware families are poorly detected by MAS

03

Scaling reduces the effectiveness of the original method

Abstract

The widespread use of smartphones in daily life has raised concerns about privacy and security among researchers and practitioners. Privacy issues are generally highly prevalent in mobile applications, particularly targeting the Android platform, the most popular mobile operating system. For this reason, several techniques have been proposed to identify malicious behavior in Android applications, including the Mining Android Sandbox approach (MAS approach), which aims to identify malicious behavior in repackaged Android applications (apps). However, previous empirical studies evaluated the MAS approach using a small dataset consisting of only 102 pairs of original and repackaged apps. This limitation raises questions about the external validity of their findings and whether the MAS approach can be generalized to larger datasets. To address these concerns, this paper presents the results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

droidxp/paper-ecoop-results
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.