APT Encrypted Traffic Detection Method based on Two-Parties and   Multi-Session for IoT

Junfeng Xu; Weiguo Lin; Wenqing Fan

arXiv:2302.13234·cs.CR·February 28, 2023

APT Encrypted Traffic Detection Method based on Two-Parties and Multi-Session for IoT

Junfeng Xu, Weiguo Lin, Wenqing Fan

PDF

Open Access

TL;DR

This paper proposes a novel APT encrypted traffic detection method for IoT that uses minimal features, converts traffic data into images, and employs CNNs for identification, demonstrating promising experimental results.

Contribution

Introduces a two-parties and multi-session based detection approach that simplifies feature extraction and leverages image recognition for encrypted traffic detection.

Findings

01

Achieves high detection accuracy in preliminary tests

02

Uses minimal feature set for effective detection

03

Verifies method's effectiveness through experiments

Abstract

APT traffic detection is an important task in network security domain, which is of great significance in the field of enterprise security. Most APT traffic uses encrypted communication protocol as data transmission medium, which greatly increases the difficulty of detection. This paper analyzes the existing problems of current APT encrypted traffic detection methods based on machine learning, and proposes an APT encrypted traffic detection method based on two parties and multi-session. This method only needs to extract a small amount of features, such as session sequence, session time interval, upstream and downstream data size, and convert them into images. Then convolutional neural network method can be used to realize image recognition. Thus, network traffic identification can be realized too. In the preliminary test of five experiments, this method achieves good experimental…

Tables2

Table 1. TABLE I: Content of Traffic Data Set

Label

Type

Stage

Application

Number of

Session Groups

APT Flow

APT Group

C&C

3500

Normal Flow

Browser

Chrome

5000

Mail

Outlook

5000

Office

Excel

5000

Video

Youku

5000

Table 2. TABLE II: experimental result data (percentage)

Data Set

Accuracy

Precision

Recall

Ratio

F1

Value

1

APT-C&C

95.3

96.3

92.9

94.6

Chrome

2

APT-C&C

99.8

99.6

99.9

99.7

Outlook

3

APT-C&C

92.0

89.0

96.5

92.6

Excel

4

APT-C&C

92.1

97.1

91.1

94.0

Youku

5

APT-C&C

96.1

97.0

95.5

96.2

Mixed Normal Flow

Equations8

A = \frac{T P + T N}{T P + F P + F N + T N}

A = \frac{T P + T N}{T P + F P + F N + T N}

P = \frac{T P}{T P + F P}

P = \frac{T P}{T P + F P}

R = \frac{T P}{T P + F N}

R = \frac{T P}{T P + F N}

F_{1} = \frac{2 P R}{P + R}

F_{1} = \frac{2 P R}{P + R}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsE-commerce and Technology Innovations

Full text

APT Encrypted Traffic Detection Method based on Two-Parties and Multi-Session for IoT

Junfeng Xu, Weiguo Lin, Wenqing Fan

Communication University of China, Beijing, China, 100085

Email: [email protected]

Abstract

APT traffic detection is an important task in network security domain, which is of great significance in the field of enterprise security. Most APT traffic uses encrypted communication protocol as data transmission medium, which greatly increases the difficulty of detection. This paper analyzes the existing problems of current APT encrypted traffic detection methods based on machine learning, and proposes an APT encrypted traffic detection method based on two parties and multi-session. This method only needs to extract a small amount of features, such as session sequence, session time interval, upstream and downstream data size, and convert them into images. Then convolutional neural network method can be used to realize image recognition. Thus, network traffic identification can be realized too. In the preliminary test of five experiments, this method achieves good experimental results, which verifies the effectiveness of the method.

I Introduction

Advanced Persistent Threat (APT) refers to the hidden and persistent process of computer network intrusion, usually for commercial or political motives, targeting specific organizations or countries, and maintaining high concealment for a long time [1]. APT consists of three elements: advanced, long-term and threat. The advanced implies the use of sophisticated malware and technology to exploit vulnerabilities in the system. The long-term implies that an external force will continuously monitor a specific target and obtain data from it. Threat refers to an attack behavior planned by humans. Carrying out APT traffic detection in enterprise network environment is of great significance in network security domain.

Traditional APT traffic detection methods rely heavily on characteristic string detection, which is based on a set of key strings analyzed and obtained from captured APT samples or traffic, such as domain names, URLs, and specific character sequences. As the detection basis of intrusion detection system (IDS) or intrusion prevention system (IPS), the characteristic string works as IoC (Indicators of Compromise) and directly recognizes and matches threats in the traffic. However, in recent years, as more and more APTs tend to use encryption protocols such as TLS for communication, most of the data in application layer has been encrypted during transmission, and this traditional characteristic string detection method are flopping.

Machine learning method can achieve statistical learning recognition effect based on feature set. In recent years, great progress has been made in the field of encrypted traffic identification. In terms of TLS encrypted traffic identification, a certain degree can be achieved with the help of spatio-temporal features, handshake features, certificate features and other non-ciphertext features to achieve some success in the field of malicious traffic detection. However, at present, many research and products use a session / flow (hereinafter referred to as flow) determined by four tuples (source IP, source port, destination IP, destination port, hereinafter referred to as quadruple) as the basic identification unit. This method is difficult to capture the multi-session communication features which is generally seen in APT communication, and the identification effect in many scenes is limited.

In order to solve the above problems, we propose an APT encrypted traffic detection method based on two parties and multi-session. From the multi-session data of two communication parties in a certain period of time, this method extracts multiple recognizable features and transforms them into image data. Then, using the advantages of deep learning method in image recognition domain, we designed convolution neural network method to realize image recognition and indirectly realize flow identification. We used the encrypted traffic of an APT group and the encrypted traffic of normal network application to carry out the experimental test. The experimental results show that the method has achieved good results in accuracy and false positive rate.

The reminder of this paper is organized as follows: Section II introduces the relevant work and explains the origin of our ideas; Section III introduces the overall technical roadmap of the work; Section IV demonstrates the experimental results and analysis; Finally we conclude in Section V.

II Related Work

The traditional APT traffic detection method that relies on feature string matching is relatively mature in the industry and are used in many IDS or IPS products. For example, the Snort [2] in early stage and Suricata [3] in recent are based on matching a set of custom rules to achieve their detection function. Such products often rely on the deep packet inspection (DPI), which uses protocol parsing to extract metadata from network flow, and take the metadata as detection unit, Finsterbusch et al. [4] summarized the current traffic identification methods based on DPI.

In recent years, there have been a lot of research on APT traffic detection method based on machine learning. Anderson et al. published relevant research results in 2016 [5] and 2017 [6]. They used various machine learning algorithms such as random forest to carry out threat detection for encrypted traffic. At present, some commercial products based on this method have appeared, such as Cisco’s StealWatch [7] and Huawei’s Agile Campus Network [8], which can achieve certain practical effects in specific application scenarios. The above research results all take a single session as the identification unit. In the process of APT attack, it contains some features that can only be presented among multiple session, which cannot be seen in a single session. For example, in Command & Control (C&C) stage [9], there will be many heartbeat sessions or multiple secret stealing sessions between the two communication parties in a certain period of time. These sessions have many identifiable features in terms of interaction sequence, data size, upload/download ratio, etc. However, the current machine learning methods have not made full use of these features.

Based on the above analysis, we propose an APT encrypted traffic detection method based on two parties and mluti-session, trying to make full use of the multi-session features that are not yet used in the current research. Thus, more accurate detection results can be achieved.

III Methodology

Since the traffic detection method we proposed has special requirements for training data, we will first introduce the specially created data set, then explain the traffic image conversion method, and finally introduce the CNN model architecture we used.

III-A Data Set

The detection method in this paper needs multiple session data between two communication parties of the same application type. In the public traffic analysis data set, most of them are sorted characteristic data, such as the classic KDD CUP1999 [10]. In a few datasets that provide raw traffic data, such as USTC-TFC-2016 [11], after analysis and comparison, we have not found a data set that can meet the requirements of our method. In order to carry out the experimental work, we used internally collected data to construct traffic data sets that meet the conditions. The data sets consists of two parts: one is the traffic data of an APT group, and specific in command and control phase which generally has more sessions; the second part is four types of normal application traffic data, including browser, mail, Microsoft Office and video. The traffic sessions of each type are grouped according to the both communication parties. The APT traffic set is 3500 groups, and the normal traffic set is 5000 groups each. The details are shown in Table 1.

III-B Traffic Image Conversion

Converting the original traffic data to image data which needs to be classified, four steps are needed: traffic analysis, session grouping, feature extraction and image conversion.

III-B1 Traffic Analysis

Traffic analysis is the basic work of traffic classification. Traffic is a kind of continuous data, which needs to be divided into discrete data according to certain rules, and then classification work can be processed on it. At present, the mainstream method is to divide the traffic into multiple session data according to the four tuples, and treat each session as an independent data unit for classification. According to our technical roadmap, we also need to divide the traffic into multiple sessions first. After this step, the input continuous traffic data can be converted into a set of discrete data units composed of multiple sessions. Suppose the input traffic is $T$ , then the output data is the session set $S={s_{1},s_{2},s_{3},...,s_{n}}$ , where s1 to $s_{n}$ are the data of each session and n is the total number of sessions.

III-B2 Session Grouping

The set of session units obtained from the traffic analysis step are further grouped according to both sides of the communication, where the communication parties are grouped into the triplet of the IP addresses of both parties and the server port. Compared with the four tuples of traffic analysis, the only difference is that the client port information is ignored. A typical scenario is that when a user uses a browser to continuously visit the same HTTPS website within a fixed period of time, multiple TLS protocol sessions will be generated. The client ports of these sessions are different, due to they are randomly selected each time. However, since the web server has a constant IP address, the client accessing the website has also a constant IP address, and the server port is fixed port $443$ . Therefore, these sessions can be grouped into one set. Intuitively, these sessions should have similar data properties. Likewise, there are similar scenarios in APT encrypted traffic. For example, the heartbeat sessions used to inform the command and control server that it is alive has similar properties. After this step, the input is session set $S={s_{1},s_{2},s_{3},...,s_{n}}$ , the output is $G={g_{1},g_{2},...,g_{m}}$ , where g1 to gm are session groups, each group contains several sessions. Sessions are arranged in chronological order of their first frame, and m is the number of session groups, or the number of two parties.

III-B3 Feature Extraction

Each session group in the session grouping result is the basic data unit for the subsequent traffic classification and identification. The feature extraction step extracts a set of features for each unit, that will be used in subsequent image conversion steps. The features extracted include session temporal relation, session time interval and up/down data ratio. Session temporal relation refers to the sequence of sessions. The extraction process of session temporal relation is relatively simple. The first frame data of all sessions in each group can be used directly. Thus, the output result is a set of time series data. Session interval refers to the time interval between each two sessions in a group, specifically the time interval between the last frame timestamp of one session and the first frame timestamp of the following session. The up/down data ratio refers to the ratio of the bytes sent from client to server and conversely bytes sent from server to client. Since in the TLS protocol, it is the application layer data that truly reflects the data exchange process, therefore, only the application layer data is used for calculation, that is, the data actually transmitted by the two parties after the key negotiation of the TLS protocol. After this step, the input is session group data $G={g_{1},g_{2},...,g_{m}}$ , output as feature set data $F={f_{1},f_{2},...,f_{m}}$ , where $f_{i}$ is feature set, and each set contains three types of feature data, such as session temporal relation, session time interval and up/down data ratio.

III-B4 Image Conversion

Image conversion refers to converting the output from feature extraction step into images, and visually reflecting the three types of feature data mentioned above. An example of the image is shown in Figure 1.

The above figure shows the multi-session traffic image of two communication parties. Two ends represent the two communication parties. The left side is the client, using client IP to label, while the right side is the server, using both the server IP and server port to label. The column chart in the middle is the interactive data between the two parties, and each column represents a session. The upper and lower parts of each column on the horizontal axis are up and down data. The length above the horizontal axis represents the bytes of application layer data sent from the client to the server, and the length below represents the bytes of application layer data sent from the server to the client. The order of columns is the order of sessions and the interval between the columns reflect the session time interval. In summary, each pair of communication parties produce one of the above images. Intuitively analyzed, different types of traffic can reflect certain image features for distinguishing and identifying.

III-C Convolution Neural Network Architecture

CNN(Convolutional Neural Network) is currently the mainstream deep learning model in the field of image classification, and has achieved excellent results in many application scenarios. Considering the complexity of the image itself and the amount of training data samples, we use the convolutional neural network architecture of LeNet-5 [12], and the architecture is shown in Figure 2.

CNN reads the pixel values from the image file, and these pixel values are normalized and converted from $0~{}255$ to $0~{}1$ . In the first convolution layer $C_{1}$ , the input is convolved by a convolution kernel with a size of $5*5$ . There are $32$ channels to generate $32$ feature maps with size of $28*28$ , and then $a2*2$ maximum pooling operation is performed in $P_{1}$ layer to generate $32$ feature maps with size of $14*14$ . In the second convolution layer $C_{2}$ , the convolution kernel size is also $5*5$ , but there are $64$ channels to generate $64$ feature maps with size of $14*14$ . Then, $a$ $2*2$ maximum pool operation is performed in $P_{2}$ layer to generate $64$ $7*7$ feature maps. Next are two fully connected layers, which convert the data size to $1024$ and $10$ in turn. Finally, a $Softmax$ function is used to output various probability values. To reduce overfitting, dropout is used before the output layer.

IV Evaluation

We use Keras and TensorFlow as the training platform, running on the $Ubuntu18.0464-bit$ operating system. $2/10$ of the training data was randomly selected as the test data, and the remaining $8/10$ was used for training. We used the following criteria to evaluate the proposed method: $Accuracy(A)$ , $Precision(P)$ , $recall(R)$ , and $F_{1}-Score(F_{1})$ as follows:

[TABLE]

Among them, true positive $TP$ represents the number of correctly identified target flows, positive and negative $TN$ denotes the number of other flows correctly identified, false positive $FP$ represents the number of target flows wrongly identified, and false negative $FN$ represents the number of target flows missed.

The experimental results are shown in Table 2. The precision, recall and F1 values refer to the corresponding result data of APT-C&C flow.

From the result data, we can see that the experiment has achieved good results in the five binary classification tasks, and all the accuracy rate are above 90%. Among them, the highest 99.8% results were achieved in the APT-C&C and Outlook classification experiments. Especially in the fifth experiment, the normal flow set is random mixed data of four normal flow types, which is more closer to the actual application scenario, and the accuracy rate is 96.1%. In conclusion, through the above experimental results, the effectiveness of our proposed APT encrypted traffic identification method based on two parties and multi-session is preliminarily verified.

V Conclusion

In order to solve the problem of APT encrypted traffic identification in the field of network security, an APT encrypted traffic identification method based on two parties and multi-session is proposed. This method does not need complex feature engineering work, but only needs to extract the multi-session temporal relation, time interval and up/down data ratio. Then convert them into image data, realize image recognition by designing convolutional neural network model, and further realize flow identification. The experimental results show that the method has achieved good results in a number of binary classification experiments and verified the effectiveness. In the next stage of our work, we will use more types of data to carry out the verification work, and expand the experimental task to multi-classification scenarios to further explore the application potential of this method.

Acknowledgment

This work is supported by the Fundamental Research Funds for the Central Universities(2018XNG1815) and MCM20180504.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Daly, M.K., 2009. Advanced persistent threat. Usenix, Nov, 4(4), pp.2013-2016.
2[2] Roesch M. Snort: Lightweight intrusion detection for networks. In Lisa 1999 Nov 7 (Vol. 99, No. 1, pp. 229-238).
3[3] Park, W. and Ahn, S., 2017. Performance comparison and detection analysis in snort and suricata environment. Wireless Personal Communications, 94(2), pp.241-252.
4[4] Finsterbusch, M., Richter, C., Rocha, E., Muller, J.A. and Hanssgen, K., 2013. A survey of payload-based traffic classification approaches. IEEE Communications Surveys & Tutorials, 16(2), pp.1135-1156.
5[5] Anderson, B. and Mc Grew, D., 2016, October. Identifying encrypted malware traffic with contextual flow data. In Proceedings of the 2016 ACM workshop on artificial intelligence and security (pp. 35-46).
6[6] Anderson, B. and Mc Grew, D., 2017, August. Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In Proceedings of the 23rd ACM SIGKDD International Conference on knowledge discovery and data mining (pp. 1723-1732).
7[7] Radhakrishnan, S., 2017. Detect threats in encrypted traffic without decryption, using network based security analytics.
8[8] Huawei agile campus network encryption communication analysis (ECA). https://e.huawei.com, 2018.