Technical Report: Generating the WEB-IDS23 Dataset
Eric Lanfer, Dominik Brockmann, Nils Aschenbruck

TL;DR
This paper introduces WEB-IDS23, a comprehensive, labeled dataset of over 12 million network traffic samples generated by a modular traffic generator, aimed at improving anomaly-based NIDS evaluation.
Contribution
The paper presents a novel traffic generator that creates a large, diverse, and well-labeled dataset with web attack types, addressing limitations of existing datasets.
Findings
Generated over 12 million samples with 82 features
Includes diverse benign and malicious traffic with fine-grained labels
Contains underrepresented web attack types
Abstract
Anomaly-based Network Intrusion Detection Systems (NIDS) require correctly labelled, representative and diverse datasets for an accurate evaluation and development. However, several widely used datasets do not include labels which are fine-grained enough and, together with small sample sizes, can lead to overfitting issues that also remain undetected when using test data. Additionally, the cybersecurity sector is evolving fast, and new attack mechanisms require the continuous creation of up-to-date datasets. To address these limitations, we developed a modular traffic generator that can simulate a wide variety of benign and malicious traffic. It incorporates multiple protocols, variability through randomization techniques and can produce attacks along corresponding benign traffic, as it occurs in real-world scenarios. Using the traffic generator, we create a dataset capturing over 12…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
