# Android Malicious Application Classification Using Clustering

**Authors:** Hemant Rathore, Sanjay K. Sahay, Palash Chaturvedi, Mohit Sewak

arXiv: 1904.10142 · 2019-04-24

## TL;DR

This paper introduces a novel clustering approach combined with machine learning classifiers to enhance the detection accuracy of Android malware, achieving over 98% overall accuracy and improving true positive and negative rates.

## Contribution

It proposes a scalable clustering method that improves malware detection accuracy using machine learning classifiers on Android applications.

## Key findings

- Achieved 98.34% overall detection accuracy with random forest.
- Best true positive rate with decision tree (97.59%).
- Best true negative rate with support vector machine (99.96%).

## Abstract

Android malware have been growing at an exponential pace and becomes a serious threat to mobile users. It appears that most of the anti-malware still relies on the signature-based detection system which is generally slow and often not able to detect advanced obfuscated malware. Hence time-to-time various authors have proposed different machine learning solutions to identify sophisticated malware. However, it appears that detection accuracy can be improved by using the clustering method. Therefore in this paper, we propose a novel scalable and effective clustering method to improve the detection accuracy of the malicious android application and obtained a better overall accuracy (98.34%) by random forest classifier compared to regular method, i.e., taking the data altogether to detect the malware. However, as far as true positive and true negative are concerned, by clustering method, true positive is best obtained by decision tree (97.59%) and true negative by support vector machine (99.96%) which is the almost same result obtained by the random forest true positive (97.30%) and true negative (99.38%) respectively. The reason that overall accuracy of random forest is high because the true positive of support vector machine and true negative of the decision tree is significantly less than the random forest.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.10142/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1904.10142/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1904.10142/full.md

---
Source: https://tomesphere.com/paper/1904.10142