Are cracked applications really free? An empirical analysis on Android   devices

Konstantinos-Panagiotis Grammatikakis; Angela Ioannou; Stavros; Shiaeles; Nicholas Kolokotronis

arXiv:1903.04793·cs.CR·March 13, 2019

Are cracked applications really free? An empirical analysis on Android devices

Konstantinos-Panagiotis Grammatikakis, Angela Ioannou, Stavros, Shiaeles, Nicholas Kolokotronis

PDF

TL;DR

This study empirically analyzes cracked Android applications, revealing they generally use more resources and request more dangerous permissions than official apps, raising security concerns.

Contribution

It provides a comparative behavioral analysis of cracked versus official Android apps using real device data and introduces an application intention score for classification.

Findings

01

Cracked apps request more dangerous permissions.

02

Cracked apps consume more system resources.

03

Cracked apps are more likely to be malicious.

Abstract

Android is among the popular platforms running on millions of smart devices, like smartphones and tablets, whose widespread adoption is seen as an opportunity for spreading malware. Adding malicious payloads to cracked applications, often popular ones, downloaded from untrusted third markets is a prevalent way for achieving the aforementioned goal. In this paper, we compare 25 applications from the official and third-party application stores delivering cracked applications. The behavioral analysis of applications is carried out on three real devices equipped with different Android versions by using five indicators: requested permissions, CPU usage, RAM usage and the number of opened ports for TCP and HTTP. Based on these indicators, we compute an application intention score and classify cracked applications as malicious or benign. The experimental results show that cracked applications…

Figures4

Click any figure to enlarge with its caption.

Tables1

Table 1. TABLE IV: Average usage overhead per application class.

	malicious	rather malicious	rather benign	benign
CPU (%)	$0.43$	$- 0.01$	$0.24$	$0.06$
RAM (MiB)	$1.81$	$0.91$	$2.03$	$2.84$
TCP ports	$41.29$	$14.78$	$- 12.11$	$8.00$
HTTP ports	$23.02$	$10.34$	$12.78$	$18.67$

Equations6

s_{i} = l = 1 \sum k w_{l} δ_{i l}, i = 1, \dots, n

s_{i} = l = 1 \sum k w_{l} δ_{i l}, i = 1, \dots, n

\delta_{il}=\operatorname{sgn}\left(\sum_{j\in\Pi_{l}}\bigl{(}p_{ij}^{o}-p_{ij}^{c}\bigr{)}\right)

\delta_{il}=\operatorname{sgn}\left(\sum_{j\in\Pi_{l}}\bigl{(}p_{ij}^{o}-p_{ij}^{c}\bigr{)}\right)

\mts L (s) = ⎩ ⎨ ⎧ ℓ_{1} : “malicious”, ℓ_{2} : “rather malicious”, ℓ_{3} : “rather benign”, ℓ_{4} : “benign”, if s < - 0.4 if - 0.4 \leq s < 0 if 0 \leq s \leq 0.4 otherwise

\mts L (s) = ⎩ ⎨ ⎧ ℓ_{1} : “malicious”, ℓ_{2} : “rather malicious”, ℓ_{3} : “rather benign”, ℓ_{4} : “benign”, if s < - 0.4 if - 0.4 \leq s < 0 if 0 \leq s \leq 0.4 otherwise

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

WiP: Are cracked applications really free? An empirical analysis on Android devices††thanks: This work was supported by CYBER-TRUST project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 786698.

Konstantinos-Panagiotis Grammatikakis2, Angela Ioannou3, Stavros Shiaeles1 and Nicholas Kolokotronis2

2Department of Informatics and Telecommunications, University of Peloponnese, 22131 Tripolis, Greece

Email: [email protected], [email protected]

3School of Pure and Applied Sciences, Open University of Cyprus, Nicosia 2220, Cyprus

Email: [email protected]

1Centre for Security, Communications and Networks Research, Plymouth University, Plymouth PL4 8AA, UK

Email: [email protected]

Abstract

Android is among the popular platforms running on millions of smart devices, like smartphones and tablets, whose widespread adoption is seen as an opportunity for spreading malware. Adding malicious payloads to cracked applications, often popular ones, downloaded from untrusted third markets is a prevalent way for achieving the aforementioned goal. In this paper, we compare 25 applications from the official and third-party application stores delivering cracked applications. The behavioral analysis of applications is carried out on three real devices equipped with different Android versions by using five indicators: requested permissions, CPU usage, RAM usage and the number of opened ports for TCP and HTTP. Based on these indicators, we compute an application intention score and classify cracked applications as malicious or benign. The experimental results show that cracked applications utilize on average more resources and request access to more (dangerous) permissions than their official counterparts.

1 Introduction

This decade, we saw the development of more powerful and compact computing devices, like mobile phones, tablets, ultra portable computers and home appliances, whose capabilities were once possible only on personal computers. At the same time, the high degree of network connectivity and the provisioning of high-speed broadband services led to the development of novel services that take advantage of these new capabilities, hence transforming these common devices into smart devices. According to the Pew Research Center, $77\%$ of the total surveyed population in the United States owns a smartphone and $53\%$ owns a tablet [13]. In addition to the traditional role of smart devices as a medium of communication and media consumption, with the advent of constant Internet connection capabilities many traditional services, such as financial or remote administration services, are now being carried out through mobile platforms. The popularity of the Android operating system (OS) grew with the popularity of smart devices, capturing the largest share of the mobile computing devices market, $74.39\%$ according to [18], while also leading the personal computing devices market as a whole with a share of $40\%$ [17]. This widespread adoption of smart devices and the Android OS in particular, along with the increasing value of the services and applications provided, makes them a valuable target for malicious actors.

There are millions of applications available for the Android OS through the official application store (estimated around $3.5$ mil [19]) and third-party application stores. The fact that the Android OS allows the installation of applications coming from third-party markets or other generally untrusted sources, also known as sideloading, in conjunction with the sheer number of available applications, makes malware installation a viable attack vector.

In this paper we use the term malware in order to denote malicious software, which is a type of computer programs, or applications in our case, developed with the intention to harm a computer network, system or its users [16]. Some common malicious actions are related to abusing services, such as cellular data connections or SMS messages, displaying unwanted advertisements, facilitating the formation of botnets to launch large-scale attacks and disclosing sensitive or personal data to third parties. The legitimate applications whose digital rights management or copy protection controls have been removed are referred to as cracked applications; such versions of legitimate applications quite often carry malicious payloads. The Google Play store is the official application store, where the Android applications of most software vendors are distributed from. Further, applications obtained from the official application store are assumed to be benign and are next referred to as official applications when compared with their cracked counterparts.

The goal and motivation of this paper is to provide further insight into the price that users of smart devices actually pay by downloading cracked applications via unofficial, untrusted third-party stores. Towards this direction a sample of $25$ applications is used, where the official and cracked versions are compared against a number of indicators (permissions, CPU and RAM usage, as well as, open TCP and HTTP ports). Although an analysis of permissions requested by Android applications has been conducted in previous works, it is known that they cannot alone accurately identify malicious intent. Therefore, we study the extend to which the combined indicators could considerably increase the accuracy of classifying a cracked application as benign or malicious. Our analysis is carried out on the three most popular versions of the Android OS: KitKat, Lollipop and Marshmallow. We observe that cracked applications request for more permissions, where the extra permissions are linked to malicious behavior, in addition to a tendency for utilization of more resources than the official applications. Moreover, our analysis illustrates that although newer versions of the Android OS are more efficient in resource management (CPU and RAM usage), the differences between cracked and official applications in these indicators are noticeable. In conjunction with the number of open TCP and HTTP ports, the set of indicators succeeds in efficiently delivering increased detection of malicious intent.

The rest of the paper is organized as follows: related work in the area of Android malware analysis is given in Section 2, whilst the design of the experimental process is presented in Section 3. The main findings of our analysis and concluding remarks are provided in Sections 4 and 5 respectively.

2 Related work

The detection and analysis of malware on mobile devices has been an area of highly intensive research since their first appearance. The earliest attempts to create anti-malware systems were based on the installation of an agent on the mobile device, responsible for monitoring device activities and for reporting them to a central system to be further analyzed. One early example of such system was presented by [4]; an agent gathered communications data with to detect possible abuse of the communication capabilities of the mobile device. A similar system, proposed by [10] requires the agent to mirror the state of the device and its communications on a cloud-based central system to be emulated and further analyzed. A system using crowd sourcing to gather application activity data was presented by [3]. A signature-based agent system was presented by [20]; the calculation of an application malware score (AMS) was performed on the central system by summing the permission malware scores (PMS) —calculated from official store applications and known malware— of each requested permission.

A behavioral analysis approach was taken by [2] whose malware detection system relies on signatures generated by monitoring of the actions performed by the suspect application —via the system events and application programming interface (API) calls— and the construction of a logical flow diagram. Moreover, an analysis of $46$ malware samples and an evaluation of existing anti-malware solutions for the Android, iOS and Symbian platforms was performed by [7]. It is interesting to note that none of the iOS malware samples were approved by Apple’s App store, indicating the need and effectiveness of human review.

In addition to the above lines of research, a permission-based analysis of mobile applications for detecting malware has been proposed. Towards this direction, $940$ applications were examined in [8] to determine whether the principle of least privilege had been followed. It was found that nearly one third of the Android applications violated the aforementioned principle, something that was attributed to developers misunderstanding the use of permissions and to the lack of a clear API documentation. An extension to the Android security enforcement system to also consider the relationship between the requested permissions was proposed by [21]. This is justified by the fact that although individual permissions cannot indicate malicious intent, their relationship can be used to classify an application as malicious. A study of $125,229$ benign and malicious applications was carried out by [11] using the requested permissions as an indicator of intent. The performance of four machine learning algorithms in terms of their detection rate draw the conclusion that although permissions alone can be used to quickly classify applications as malicious or benign, a secondary analysis is required to make the final decision. A more extensive analysis of an application’s manifest file —based on searching for terms pertaining to permission requests, process names and identifiers— was proposed by [15]. Likewise, they concluded that textual analysis of the manifest file is resource efficient and when combined with other techniques it can improve the accuracy of the analysis. Many works rely on binary classification to decide whether an application is malicious or benign; although in certain cases a higher granularity would be needed for providing a more accurate characterization of an application. To solve this problem, [9] proposed the use of risk scoring functions to calculate an overall value that is subsequently used to characterize an application. A permission-based system was presented by [12] considering only permissions that are rarely requested by malicious or benign applications and using machine learning to differentiate between the two classes.

Along with the development of anti-malware systems, the problem of classifying malware and security threats for mobile platforms has also been considered. An analysis of $1,260$ Android malware samples was conducted by [22] using an evolutionary-based approach. It was found that about $86\%$ of the samples were repackaged and therefore they highlighted the need for reviewing the applications in Android application stores (not only the official one), just as [7] proposed. Quite recently, [14] conducted a more thorough review of the existing literature, from 2008 to 2016, where taxonomies on many different areas and approaches used in the literature for Android malware analysis were presented.

3 Proposed methodology

In order to study the differences in the behavior between benign applications downloaded from the official application store and the cracked ones, a sample of $n=25$ applications was used as a proof of concept —that is, to demonstrate that cracked applications often carry malicious payloads with the intention to harm the mobile device where they are installed or its user. The applications in our sample, which are listed alphabetically in Table I, were randomly selected from two third-party stores111Cracked applications were downloaded from CrackAPK.com and AppCake.net, which both accept user–uploaded applications. and were tested on the three Android OS versions with the largest market share: KitKat, Lollipop and Marshmallow. Instead of analyzing the behavior of the sample applications in a simulated environment, the setup of the experimental process involved using three Samsung Galaxy mobile devices

•

S3 neo with Android v4.4.2 (KitKat),

•

S5 with Android v5.0 (Lollipop), and

•

S7 with Android v6.0 (Marshmallow)

in order to study the applications’ behavior in real life use case scenarios and avoid detection of the simulated process by potential malicious payloads; they often exhibit different behavioral patterns if simulated environments are detected.

Our approach is based on assumptions about the behavioral patterns exhibited by applications. In particular, official applications are considered to be benign by default, as their intentions are stated on their official store and consent is given at install-time; the same holds with cracked applications, which are assumed to be benign unless proven otherwise. Cracked applications displaying significant behavioral deviations from the official ones are considered to be malicious. It is noted that the differences in an indicator alone may not suggest malicious behavior, since small differences could be well attributed to the deletion or insertion of bytecode by the patching process; on the other hand, deviations in many indicators increase the possibility of malicious intentions. The requested permissions exhibit varying degrees of correlation to malicious intent —as evidenced in the literature [7, 9, 12, 22]— and their study can classify an application as malicious or benign.

Malware analysis is usually performed either manually by a security analyst, or automatically by special software, and there are three prevalent approaches: static, dynamic and hybrid [16, 14]. Static methods focus on characteristics such as an application’s binary code, structure and metadata, while dynamic approaches aim at analyzing an application’s behavior during its execution. Hybrid techniques combine both static and dynamic aspects to get a more complete view of the suspect application; our approach can be classified as such, since we measure both static and dynamic indicators. Due the need for analyzing both the official and the cracked version of each application on three Android OSs, we need to choose indicators that may be measured accurately and efficiently. In particular, the indicators used in our study for each application $i=1,\ldots,n$ are the requested permissions $\boldsymbol{p}_{i}$ , CPU usage $c_{i}$ , RAM usage $r_{i}$ , as well as the number of ports opened for TCP and HTTP communications that are denoted by $t_{i}$ and $h_{i}$ respectively.

The first indicator, the requested permissions $\boldsymbol{p}_{i}$ , can be obtained by the application manifest file. A total of $m=16$ permissions were tracked, which are listed in Table II, being the union of those requested by the applications; thus we let $\boldsymbol{p}_{i}=(p_{i1},\ldots,p_{im})$ , where $p_{ij}\in\{0,1\}$ indicates if the $j$ th permission is requested by the $i$ th application. Permissions are requested by an application to obtain access to hardware resources, e.g. the microphone or camera, and to restricted API functions by declarations in the manifest file [6]. They are granted at install-time or at run-time (from version 6.0, Marshmallow and later) and are classified in three protection levels [1]: normal, dangerous and special. The manifest file AndroidManifest.xml is found inside the Android package (APK) file, which constitutes the main way that all applications are distributed and installed on the Android OS. We used the Show Java application on the mobile devices to unpack and extract the files contained in APK files.

CPU and RAM usage measurements can be obtained by the Android OS application monitoring services (the means of access differs between Android OS versions). The usage of CPU and RAM may indicate differences in the bytecode or the memory consumption patterns between the official and a cracked version. The values reported by the Android OS on each mobile device were used.

The number of open TCP and HTTP ports was obtained by packet inspection of the network traffic generated by each application. The hypertext transfer protocol (HTTP) is often utilized by malware for communicating with a command and control server to receive new commands for data extraction or to download files on the infected devices. It is commonly used by legitimate applications to download resources and to use APIs available through the Internet. This makes less suspicious the use of HTTP and in addition HTTP traffic is widely allowed to pass through the network firewalls. The transport control protocol (TCP) —which is widely used for providing, at layer 4 of the open systems interconnection (OSI) model, connection-oriented and reliable data stream services that an application requires for sending and receiving error–free data— was monitored to capture suspicious connections established by malicious payloads having been included in cracked applications. We have used “Wireshark” on a computer on the same network with the mobile devices to capture and analyze the generated network traffic.

3.1 Application intention score

Based on the above, an application is characterized by a tuple $a_{i}=(\boldsymbol{p}_{i},c_{i},r_{i},t_{i},h_{i})$ , $i=1,\ldots,n$ , where $a_{i}^{o}$ and $a_{i}^{c}$ are used to differentiate between the official and the cracked versions’ profiles. Since permissions exhibit varying degrees of correlation to malicious intent [7, 9, 12, 22], a number of $k=3$ permission groups $\Pi_{l}$ , for $l=1,\ldots,k$ , were defined to simplify the analysis. Group $\Pi_{1}$ contains the permissions considered to be highly indicative of malicious behavior, i.e. those in the set $\{1,10,\ldots,16\}$ . The group $\Pi_{2}$ includes the permissions $\{6,7,8\}$ that could suggest malicious intention and have a smaller correlation compared to the permissions in the first group. Finally, the group $\Pi_{3}$ has the remaining permissions $\{2,\ldots,5,9\}$ that are commonly requested from both malicious and benign applications. As in [9], we define a mapping that provides an overall value characterizing an application’s intentions; this is called application intention score $s\in[-1,1]$ and is determined by

[TABLE]

where $w_{l}$ is the weight assigned to the permission group $\Pi_{l}$ , with $w_{1}+\cdots+w_{k}=1$ , and $\delta_{il}\in\{-1,0,1\}$ . The term $\delta_{il}$ represents the group difference score of the $i$ th application

[TABLE]

where $\operatorname{sgn}(\cdot)$ is the signum function for which we have that $\operatorname{sgn}(0)=0$ by convention. Note that $\delta_{il}<0$ if and only if the cracked application requests more permissions than the official one. Moreover, we define $\mts{L}:[-1,1]\rightarrow L$ as

[TABLE]

mapping the application intention score onto a set of classes or labels $L=\{\ell_{1},\ldots,\ell_{4}\}$ characterizing cracked apps with respect to the difference in the requested permissions. Using this classification, we next seek for correlation with the other indicators measured in this analysis.

4 Experimental results

In this section, we present the results of our analysis for the sample applications used. The requested permissions per application (official and cracked ones) are listed in Table III.

The weights that have been empirically assigned in (1) are equal to $w_{1}=0.6$ , $w_{2}=0.3$ , and $w_{3}=0.1$ . By computing the application intention score, the cracked applications of Table I are classified according to the mapping $\mts{L}$ , and the results obtained are as follows:

•

$\ell_{1}$ contains 1, 3–4, 6–7, 10–18, 22–23, and 25.

•

$\ell_{2}$ contains 2, 19, and 20.

•

$\ell_{3}$ contains 5, 8, and 9.

•

$\ell_{4}$ contains 21 and 24.

In general, we see that the cracked applications tend to request more permissions than the official applications, with an overall average $7.36$ versus $2.64$ permissions. We note that nineteen cracked applications requested permissions 10–16 and eight cracked applications requested Internet access (permission 1) even though their official versions did not; all of them are classified as malicious or rather malicious. Also permissions related to SMS messaging (10–12) were requested together as a set; this pattern was also observed with permissions 6 and 8 that are related to read/write access to external storage.

The measured CPU usage (%) and the measured RAM usage (MiB) across all Android OS versions are presented in Figures 1 and 2 respectively. In general, cracked applications tend to utilize slightly more CPU and RAM resources than their official counterparts. The overall CPU usage average is $3.25\%$ for cracked applications in contrast to $2.93\%$ for the official applications; moreover, the overall RAM usage average is $42.65$ MiB and $40.80$ MiB for the cracked and the official applications respectively. In all figures, the box plots ( ) indicate the minimum and maximum values among the Android versions considered. We noticed that, with very few exceptions, the maximum (resp. minimum) values for both CPU and RAM usage were attained on KitKat (resp. Marshmallow) implying that possible use of these indicators in newer versions of the Android OS is rather hard due to the more efficient use of the available resources.

The number of the open ports for TCP and HTTP per application across all Android OS versions are presented in Figures 3 and 4 respectively. Clearly, cracked applications in most cases open more ports for both protocols than official applications. The overall average of the TCP ports opened is $131.19$ and $102.15$ for cracked and official applications respectively, whereas the overall average of the HTTP ports opened is $39.92$ and $20$ for cracked and official applications respectively.

The average usage overheads that are measured by using cracked applications, across all the Android OS versions, are presented in Table IV for each application class. We observe that cracked applications, classified as malicious according to (3), in most cases utilize significantly more resources and request more permissions linked to malicious behavior (see group $\Pi_{1}$ ) than cracked applications having been classified as benign; this illustrates the existence of a clear difference between these two extreme ends. Furthermore, the overhead incurred by cracked applications classified as rather benign/ malicious confirm the uncertainty of our classification; more precisely, rather malicious apps utilize less CPU and RAM resources, whereas rather benign apps generate significantly less network traffic as evidenced by the additional TCP and HTTP ports opened. These differences seem to confirm the existence of correlation between the classification used and and the four new indicators. However, due to the fact that the applications’ sample was small —just used to establish a proof of concept— further and more extensive testing of cracked applications should be performed to prove positive impact in the detection of malware for the Android OS.

5 Concluding remarks

An empirical analysis of cracked applications running on Android platforms was carried out in this paper. The sample set consisted of $25$ applications whose cracked and official versions were compared against a number of indicators: the requested permissions, CPU usage, RAM usage, as well as, the number of ports opened for TCP and HTTP. Following the introduction of an application intention score function, which relies on the permissions requested by the application, cracked applications were classified into groups associated with varying likelihoods of carrying malicious payloads. The extent at which the information provided by other indicators can increase the accuracy of classification is considered.

Although any deviations in CPU and RAM usage (resp. in the number of TCP and HTTP ports opened) alone are often not indicative of malicious behavior, when paired with reliable malware detection methods their accuracy can be considerably increased. Our preliminary results across all the tested Android OS versions show that cracked applications request on average more permissions, tend to utilize slightly more CPU and RAM resources and open more TCP and HTTP ports than official applications; the classification resulted in about $80\%$ of the cracked applications to be classified as malicious or rather malicious. These findings suggest that cracked applications have questionable intentions, that users should be vigilant when installing cracked and untrusted applications, and that human review is required in both official and third-party application stores.

Possible directions for future work include the increase of the sample size in order to obtain statistically robust results and yield more accurate information about how permissions and values in the rest of the indicators are distributed for each application type (cracked and official ones). Differences in the distributions can further be used to design accurate application intention score functions and help detect malicious payloads in cracked applications.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Android Developers, Permissions Overview . Accessed: 28 Feb. 2018. Available: https://goo.gl/A 7QG 1J.
2[2] A. Bose, X. Hu, K. G. Shin, and T. Park, “Behavioral detection of malware on mobile handsets,” in proc. 6th Int’l Conf. Mobile Systems, Appl. and Services — ACM Mobi Sys ’08 , pp. 225–238, 2008.
3[3] I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani, “Crowdroid: behavior-based malware detection system for Android,” in proc. 1st ACM Wkshp Security and Privacy in Smartphones and Mobile Devices — ACM SPSM ’11 , pp. 15–26, 2011.
4[4] J. Cheng, S. H. Wong, H. Yang, and S. Lu, “Smartsiren: virus detection and alert for smartphones,” in proc. 5th Int’l Conf. Mobile Systems, Appl. and Services — ACM Mobi Sys ’07 , pp. 258–271, 2007.
5[5] D. Dagon, T. Martin, and T. Starner, “Mobile phones as computing devices: the viruses are coming!,” IEEE Pervasive Computing , vol. 3, no. 4, pp. 11–15, 2004.
6[6] N. Elenkov, Android Security Internals: An In–depth Guide to Android’s Security Architecture , No Starch Press, 2015.
7[7] A. P. Felt, M. Finifter, E. Chin, S. Hanna, and D. Wagner, “A survey of mobile malware in the wild,” in proc. 1st ACM Wkshp Security and Privacy in Smartphones and Mobile Devices — ACM SPSM ’11 , pp. 3–14, 2011.
8[8] A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner, “Android permissions demystified,” in proc. 18th ACM Conf. Computer and Communications Security — ACM CCS ’11 , pp. 627–638, 2011.