TL;DR
This paper introduces a set of maintainable, labeled log datasets generated in a testbed environment for evaluating intrusion detection systems, addressing the lack of publicly available, reproducible datasets.
Contribution
It presents a scalable, model-driven approach to generate and label diverse intrusion detection datasets in a controlled testbed environment.
Findings
8 datasets with 20 log file types provided
Labeled 8 files with 10 attack steps each
Open-source code and datasets published online
Abstract
Intrusion detection systems (IDS) monitor system logs and network traffic to recognize malicious activities in computer networks. Evaluating and comparing IDSs with respect to their detection accuracies is thereby essential for their selection in specific use-cases. Despite a great need, hardly any labeled intrusion detection datasets are publicly available. As a consequence, evaluations are often carried out on datasets from real infrastructures, where analysts cannot control system parameters or generate a reliable ground truth, or private datasets that prevent reproducibility of results. As a solution, we present a collection of maintainable log datasets collected in a testbed representing a small enterprise. Thereby, we employ extensive state machines to simulate normal user behavior and inject a multi-step attack. For scalable testbed deployment, we use concepts from model-driven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
