TL;DR
This paper introduces RAPIER, a robust framework for detecting encrypted malicious network traffic in low-quality datasets, utilizing data distribution insights and label correction to significantly improve detection performance.
Contribution
The paper presents RAPIER, a novel system that leverages distribution-based data augmentation and label noise correction to enhance encrypted malicious traffic detection in low-quality datasets.
Findings
RAPIER achieves high F1 scores on public datasets with noisy data.
It significantly outperforms existing methods, with improvements over 200%.
Effective in real-world enterprise traffic detection.
Abstract
Machine learning (ML) is promising in accurately detecting malicious flows in encrypted network traffic; however, it is challenging to collect a training dataset that contains a sufficient amount of encrypted malicious data with correct labels. When ML models are trained with low-quality training data, they suffer degraded performance. In this paper, we aim at addressing a real-world low-quality training dataset problem, namely, detecting encrypted malicious traffic generated by continuously evolving malware. We develop RAPIER that fully utilizes different distributions of normal and malicious traffic data in the feature space, where normal data is tightly distributed in a certain area and the malicious data is scattered over the entire feature space to augment training data for model training. RAPIER includes two pre-processing modules to convert traffic into feature vectors and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
