Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data
Benjamin L. Bullough, Anna K. Yanchenko, Christopher L. Smith, Joseph, R. Zipkin

TL;DR
This paper evaluates machine learning methods for predicting whether disclosed software vulnerabilities will be exploited, highlighting methodological issues in prior studies and emphasizing the importance of data selection for real-world applicability.
Contribution
It replicates and compares existing approaches, revealing how data selection impacts predictive performance and providing methodological insights for future research.
Findings
Prior models' performance is highly sensitive to data selection.
Methodological flaws can inflate perceived predictive accuracy.
Proper data handling is crucial for real-world vulnerability exploitation prediction.
Abstract
Each year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known and users quickly install those patches as soon as they are available. However, most vulnerabilities are never actually exploited. Since writing, testing, and installing software patches can involve considerable resources, it would be desirable to prioritize the remediation of vulnerabilities that are likely to be exploited. Several published research studies have reported moderate success in applying machine learning techniques to the task of predicting whether a vulnerability will be exploited. These approaches typically use features derived from vulnerability databases (such as the summary text describing the vulnerability) or social…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
