A Comparison of Vulnerability Feature Extraction Methods from Textual Attack Patterns
Refat Othman, Bruno Rossi, Russo Barbara

TL;DR
This paper compares five textual feature extraction methods for vulnerability descriptions in threat reports, finding TF-IDF to be the most effective for identifying related vulnerabilities.
Contribution
It provides a comparative analysis of five feature extraction techniques, highlighting TF-IDF's superior performance in vulnerability text classification.
Findings
TF-IDF achieves 75% precision and 64% F1 score.
Other methods like BERT, MiniLM, RoBERTa, and LSI perform less effectively.
The study offers guidance for selecting feature extraction methods in cybersecurity threat analysis.
Abstract
Nowadays, threat reports from cybersecurity vendors incorporate detailed descriptions of attacks within unstructured text. Knowing vulnerabilities that are related to these reports helps cybersecurity researchers and practitioners understand and adjust to evolving attacks and develop mitigation plans. This paper aims to aid cybersecurity researchers and practitioners in choosing attack extraction methods to enhance the monitoring and sharing of threat intelligence. In this work, we examine five feature extraction methods (TF-IDF, LSI, BERT, MiniLM, RoBERTa) and find that Term Frequency-Inverse Document Frequency (TF-IDF) outperforms the other four methods with a precision of 75\% and an F1 score of 64\%. The findings offer valuable insights to the cybersecurity community, and our research can aid cybersecurity researchers in evaluating and comparing the effectiveness of upcoming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Information and Cyber Security · Software Engineering Research
