Improving Zero-Day Malware Testing Methodology Using Statistically Significant Time-Lagged Test Samples
Konstantin Berlin, Joshua Saxe

TL;DR
This paper introduces a statistically justified time-delay sampling method for evaluating zero-day malware detection, requiring large sample sizes and detailed labeling to improve accuracy and practicality.
Contribution
It proposes a novel time-delay sampling approach and detailed labeling scheme to enhance zero-day malware detection evaluation.
Findings
Large sample sizes needed for accurate evaluation.
Time-delay sampling enables efficient collection of unseen samples.
Enhanced labeling improves modeling of file distribution.
Abstract
Enterprise networks are in constant danger of being breached by cyber-attackers, but making the decision about what security tools to deploy to mitigate this risk requires carefully designed evaluation of security products. One of the most important metrics for a protection product is how well it is able to stop malware, specifically on "zero"-day malware that has not been seen by the security community before. However, evaluating zero-day performance is difficult, because of larger number of previously unseen samples that are needed to properly measure the true and false positive rate, and the challenges involved in accurately labeling these samples. This paper addresses these issues from a statistical and practical perspective. Our contributions include first showing that the number of benign files needed for proper evaluation is on the order of a millions, and the number of malware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Testing and Debugging Techniques · Network Security and Intrusion Detection
