Balanced Datasets for IoT IDS
Alaa Alhowaide, Izzat Alsmadi, Jian Tang

TL;DR
This paper addresses the challenge of unbalanced datasets in IoT intrusion detection by proposing algorithms to generate balanced datasets, improving the training and testing of IDS models.
Contribution
It introduces novel sampling algorithms that create balanced and representative datasets from existing IoT cybersecurity datasets.
Findings
Sampling algorithms reliably produce balanced datasets
Algorithms effectively represent original datasets
Enhanced dataset quality for IoT IDS training
Abstract
As the Internet of Things (IoT) continues to grow, cyberattacks are becoming increasingly common. The security of IoT networks relies heavily on intrusion detection systems (IDSs). The development of an IDS that is accurate and efficient is a challenging task. As a result, this challenge is made more challenging by the absence of balanced datasets for training and testing the proposed IDS. In this study, four commonly used datasets are visualized and analyzed visually. Moreover, it proposes a sampling algorithm that generates a sample that represents the original dataset. In addition, it proposes an algorithm to generate a balanced dataset. Researchers can use this paper as a starting point when investigating cybersecurity and machine learning. The proposed sampling algorithms showed reliability in generating well-representing and balanced samples from NSL-KDD, UNSW-NB15, BotNetIoT-01,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques
