Revisiting Android App Categorization
Marco Alecci, Jordan Samhi, Tegawend\'e F. Bissyand\'e, Jacques Klein

TL;DR
This paper evaluates existing Android app categorization methods using a new dataset, demonstrating the superiority of description-based approaches and proposing two novel methods that enhance categorization accuracy and tool performance.
Contribution
It introduces a comprehensive evaluation framework, a new ground-truth dataset, and two innovative categorization approaches that outperform existing methods.
Findings
Description-based approaches outperform APK-based methods.
Proposed methods significantly improve categorization accuracy.
Enhanced categorization benefits downstream tools and tasks.
Abstract
Numerous tools rely on automatic categorization of Android apps as part of their methodology. However, incorrect categorization can lead to inaccurate outcomes, such as a malware detector wrongly flagging a benign app as malicious. One such example is the SlideIT Free Keyboard app, which has over 500000 downloads on Google Play. Despite being a "Keyboard" app, it is often wrongly categorized alongside "Language" apps due to the app's description focusing heavily on language support, resulting in incorrect analysis outcomes, including mislabeling it as a potential malware when it is actually a benign app. Hence, there is a need to improve the categorization of Android apps to benefit all the tools relying on it. In this paper, we present a comprehensive evaluation of existing Android app categorization approaches using our new ground-truth dataset. Our evaluation demonstrates the notable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Web Data Mining and Analysis · Mobile and Web Applications
