Low-Quality Training Data Only? A Robust Framework for Detecting   Encrypted Malicious Network Traffic

Yuqi Qing; Qilei Yin; Xinhao Deng; Yihao Chen; Zhuotao Liu; Kun Sun,; Ke Xu; Jia Zhang; Qi Li

arXiv:2309.04798·cs.CR·September 12, 2023

Low-Quality Training Data Only? A Robust Framework for Detecting Encrypted Malicious Network Traffic

Yuqi Qing, Qilei Yin, Xinhao Deng, Yihao Chen, Zhuotao Liu, Kun Sun,, Ke Xu, Jia Zhang, Qi Li

PDF

1 Repo

TL;DR

This paper introduces RAPIER, a robust framework for detecting encrypted malicious network traffic in low-quality datasets, utilizing data distribution insights and label correction to significantly improve detection performance.

Contribution

The paper presents RAPIER, a novel system that leverages distribution-based data augmentation and label noise correction to enhance encrypted malicious traffic detection in low-quality datasets.

Findings

01

RAPIER achieves high F1 scores on public datasets with noisy data.

02

It significantly outperforms existing methods, with improvements over 200%.

03

Effective in real-world enterprise traffic detection.

Abstract

Machine learning (ML) is promising in accurately detecting malicious flows in encrypted network traffic; however, it is challenging to collect a training dataset that contains a sufficient amount of encrypted malicious data with correct labels. When ML models are trained with low-quality training data, they suffer degraded performance. In this paper, we aim at addressing a real-world low-quality training dataset problem, namely, detecting encrypted malicious traffic generated by continuously evolving malware. We develop RAPIER that fully utilizes different distributions of normal and malicious traffic data in the feature space, where normal data is tightly distributed in a certain area and the malicious data is scattered over the entire feature space to augment training data for model training. RAPIER includes two pre-processing modules to convert traffic into feature vectors and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xxnormal/rapier
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.