Few-Sample Named Entity Recognition for Security Vulnerability Reports by Fine-Tuning Pre-Trained Language Models
Guanqun Yang, Shay Dineen, Zhipeng Lin, Xueqing Liu

TL;DR
This paper demonstrates that fine-tuning pre-trained language models with minimal labeled data can effectively extract information from security vulnerability reports, significantly reducing the need for large labeled datasets.
Contribution
It introduces a novel approach applying pre-trained language models to few-sample NER for security reports, reducing labeling requirements by over 88%.
Findings
Pre-trained models achieve comparable or better performance with fewer labeled samples.
Significant reduction in labeling effort: 90% in fine-tuning and 88.8% in transfer learning.
Open-source code provided for reproducibility and further research.
Abstract
Public security vulnerability reports (e.g., CVE reports) play an important role in the maintenance of computer and network systems. Security companies and administrators rely on information from these reports to prioritize tasks on developing and deploying patches to their customers. Since these reports are unstructured texts, automatic information extraction (IE) can help scale up the processing by converting the unstructured reports to structured forms, e.g., software names and versions and vulnerability types. Existing works on automated IE for security vulnerability reports often rely on a large number of labeled training samples. However, creating massive labeled training set is both expensive and time consuming. In this work, for the first time, we propose to investigate this problem where only a small number of labeled training samples are available. In particular, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Topic Modeling · Software Engineering Research
