A Pipeline of Augmentation and Sequence Embedding for Classification of Imbalanced Network Traffic
Matin Shokri, Ramin Hasibi

TL;DR
This paper introduces a novel pipeline combining data augmentation with LSTM and KDE, and a new embedding method called FS-Embedding, to improve network traffic classification on imbalanced datasets, reducing model complexity without sacrificing accuracy.
Contribution
The paper presents a new augmentation pipeline and FS-Embedding technique that enhance classification of imbalanced network traffic data, improving convergence and reducing model parameters.
Findings
Augmentation with LSTM and KDE improves dataset balance.
FS-Embedding outperforms one-hot encoding in classification tasks.
Pipeline reduces model complexity while maintaining accuracy.
Abstract
Network Traffic Classification (NTC) is one of the most important tasks in network management. The imbalanced nature of classes on the internet presents a critical challenge in classification tasks. For example, some classes of applications are much more prevalent than others, such as HTTP. As a result, machine learning classification models do not perform well on those classes with fewer data. To address this problem, we propose a pipeline to balance the dataset and classify it using a robust and accurate embedding technique. First, we generate artificial data using Long Short-Term Memory (LSTM) networks and Kernel Density Estimation (KDE). Next, we propose replacing one-hot encoding for categorical features with a novel embedding framework based on the "Flow as a Sentence" perspective, which we name FS-Embedding. This framework treats the source and destination ports, along with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Network Security and Intrusion Detection · Imbalanced Data Classification Techniques
