NetFlow Datasets for Machine Learning-based Network Intrusion Detection Systems
Mohanad Sarhan, Siamak Layeghy, Nour Moustafa, Marius Portmann

TL;DR
This paper introduces NetFlow feature datasets derived from four benchmark NIDS datasets to facilitate consistent evaluation of ML-based intrusion detection systems across different data sources.
Contribution
It provides publicly available NetFlow datasets from multiple benchmarks, enabling more reliable and comparable ML model evaluations for network intrusion detection.
Findings
NetFlow features yield similar binary classification results across datasets.
Multi-class classification performance is lower with NetFlow features compared to original features.
NetFlow datasets are easier to extract and publicly available for research.
Abstract
Machine Learning (ML)-based Network Intrusion Detection Systems (NIDSs) have proven to become a reliable intelligence tool to protect networks against cyberattacks. Network data features has a great impact on the performances of ML-based NIDSs. However, evaluating ML models often are not reliable, as each ML-enabled NIDS is trained and validated using different data features that may do not contain security events. Therefore, a common ground feature set from multiple datasets is required to evaluate an ML model's detection accuracy and its ability to generalise across datasets. This paper presents NetFlow features from four benchmark NIDS datasets known as UNSW-NB15, BoT-IoT, ToN-IoT, and CSE-CIC-IDS2018 using their publicly available packet capture files. In a real-world scenario, NetFlow features are relatively easier to extract from network traffic compared to the complex features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
