PicoDomain: A Compact High-Fidelity Cybersecurity Dataset
Craig Laprade, Benjamin Bowman, H. Howie Huang

TL;DR
PicoDomain is a small, high-quality cybersecurity dataset of Zeek logs from realistic intrusions, designed for efficient validation and development of cybersecurity analytics tools.
Contribution
It introduces a compact, high-fidelity dataset with ground truth, enabling rapid testing and development of cybersecurity solutions on enterprise-like traffic.
Findings
Validated dataset using statistical analysis
Demonstrated effectiveness with machine learning techniques
Supports rapid prototype development
Abstract
Analysis of cyber relevant data has become an area of increasing focus. As larger percentages of businesses and governments begin to understand the implications of cyberattacks, the impetus for better cybersecurity solutions has increased. Unfortunately, current cybersecurity datasets either offer no ground truth or do so with anonymized data. The former leads to a quandary when verifying results and the latter can remove valuable information. Additionally, most existing datasets are large enough to make them unwieldy during prototype development. In this paper we have developed the PicoDomain dataset, a compact high-fidelity collection of Zeek logs from a realistic intrusion using relevant Tools, Techniques, and Procedures. While simulated on a small-scale network, this dataset consists of traffic typical of an enterprise network, which can be utilized for rapid validation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Smart Grid Security and Resilience
