Automated Mapping of CVE Vulnerability Records to MITRE CWE Weaknesses
Ashraf Haddad, Najwa Aaraj, Preslav Nakov, Septimiu Fabian Mare

TL;DR
This paper introduces a new dataset and deep learning approach for automatically mapping CVE vulnerability records to MITRE CWE weaknesses, improving semantic understanding over traditional methods.
Contribution
It provides the first manually annotated dataset of 4,012 records and demonstrates the effectiveness of fine-tuned deep models like Sentence-BERT and rankT5 for this mapping task.
Findings
Deep models outperform BM25, BERT, and RoBERTa in accuracy.
Fine-tuned Sentence-BERT and rankT5 show significant performance gains.
The dataset enables supervised learning for vulnerability-to-weakness mapping.
Abstract
In recent years, a proliferation of cyber-security threats and diversity has been on the rise culminating in an increase in their reporting and analysis. To counter that, many non-profit organizations have emerged in this domain, such as MITRE and OSWAP, which have been actively tracking vulnerabilities, and publishing defense recommendations in standardized formats. As producing data in such formats manually is very time-consuming, there have been some proposals to automate the process. Unfortunately, a major obstacle to adopting supervised machine learning for this problem has been the lack of publicly available specialized datasets. Here, we aim to bridge this gap. In particular, we focus on mapping CVE records into MITRE CWE Weaknesses, and we release to the research community a manually annotated dataset of 4,012 records for this task. With a human-in-the-loop framework in mind, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Information and Cyber Security · Data Quality and Management
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Attention Dropout · WordPiece · Dense Connections · Dropout · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia?
