Revisiting Network Traffic Analysis: Compatible network flows for ML models
Jo\~ao Vitorino, Daniela Pinto, Eva Maia, Ivone Amorim, Isabel Pra\c{c}a

TL;DR
This paper explores how analyzing raw network packets and extracting consistent features can improve the training and robustness of ML models for cyberattack detection in IoT networks.
Contribution
It demonstrates that preprocessing PCAP files to generate new features enhances model performance and compatibility across datasets compared to using only original CSV data.
Findings
Analyzing PCAP files yields more relevant features for ML training.
Preprocessed features improve detection accuracy of ensemble models.
Enhanced feature extraction aids in dataset compatibility and model robustness.
Abstract
To ensure that Machine Learning (ML) models can perform a robust detection and classification of cyberattacks, it is essential to train them with high-quality datasets with relevant features. However, it can be difficult to accurately represent the complex traffic patterns of an attack, especially in Internet-of-Things (IoT) networks. This paper studies the impact that seemingly similar features created by different network traffic flow exporters can have on the generalization and robustness of ML models. In addition to the original CSV files of the Bot-IoT, IoT-23, and CICIoT23 datasets, the raw network packets of their PCAP files were analysed with the HERA tool, generating new labelled flows and extracting consistent features for new CSV versions. To assess the usefulness of these new flows for intrusion detection, they were compared with the original versions and were used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting · Software-Defined Networks and 5G
