Network Intrusion Datasets: A Survey, Limitations, and Recommendations
Patrik Goldschmidt, Daniela Chud\'a

TL;DR
This paper systematically reviews 89 network intrusion detection datasets, analyzing their properties, limitations, and suitability to guide future research and improve data quality in cybersecurity.
Contribution
It provides a comprehensive comparison of datasets, discusses domain challenges, and offers best practices for dataset selection and generation in NIDS research.
Findings
Many datasets remain underutilized and poorly understood.
Data limitations hinder the development of robust NIDS.
Best practices can improve dataset quality and research outcomes.
Abstract
Data-driven cyberthreat detection has become a crucial defense technique in modern cybersecurity. Network defense, supported by Network Intrusion Detection Systems (NIDSs), has also increasingly adopted data-driven approaches, leading to greater reliance on data. Despite the importance of data, its scarcity has long been recognized as a major obstacle in NIDS research. In response, the community has published many new datasets recently. However, many of them remain largely unknown and unanalyzed, leaving researchers uncertain about their suitability for specific use cases. In this paper, we aim to address this knowledge gap by performing a systematic literature review (SLR) of 89 public datasets for NIDS research. Each dataset is comparatively analyzed across 13 key properties, and its potential applications are outlined. Beyond the review, we also discuss domain-specific challenges…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Spam and Phishing Detection
