NapierOne: A modern mixed file data set alternative to Govdocs1
Simon R Davies, Richard Macfarlane, William J Buchanan

TL;DR
NapierOne is a detailed, reproducible mixed file dataset designed for cybersecurity research, especially ransomware detection, addressing prior reproducibility issues and complementing existing datasets like Govdocs1.
Contribution
It introduces a comprehensive, well-documented dataset creation methodology and provides a diverse set of real-world files for improved research consistency.
Findings
Created a diverse dataset of 5000 real-world files
Identified common file types and characteristics used in ransomware
Enhanced reproducibility in cybersecurity research datasets
Abstract
It was found when reviewing the ransomware detection research literature that almost no proposal provided enough detail on how the test data set was created, or sufficient description of its actual content, to allow it to be recreated by other researchers interested in reconstructing their environment and validating the research results. A modern cybersecurity mixed file data set called NapierOne is presented, primarily aimed at, but not limited to, ransomware detection and forensic analysis research. NapierOne was designed to address this deficiency in reproducibility and improve consistency by facilitating research replication and repeatability. The methodology used in the creation of this data set is also described in detail. The data set was inspired by the Govdocs1 data set and it is intended that NapierOne be used as a complement to this original data set. An investigation was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
