Few-Sample Named Entity Recognition for Security Vulnerability Reports   by Fine-Tuning Pre-Trained Language Models

Guanqun Yang; Shay Dineen; Zhipeng Lin; Xueqing Liu

arXiv:2108.06590·cs.CL·August 17, 2021·1 cites

Few-Sample Named Entity Recognition for Security Vulnerability Reports by Fine-Tuning Pre-Trained Language Models

Guanqun Yang, Shay Dineen, Zhipeng Lin, Xueqing Liu

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that fine-tuning pre-trained language models with minimal labeled data can effectively extract information from security vulnerability reports, significantly reducing the need for large labeled datasets.

Contribution

It introduces a novel approach applying pre-trained language models to few-sample NER for security reports, reducing labeling requirements by over 88%.

Findings

01

Pre-trained models achieve comparable or better performance with fewer labeled samples.

02

Significant reduction in labeling effort: 90% in fine-tuning and 88.8% in transfer learning.

03

Open-source code provided for reproducibility and further research.

Abstract

Public security vulnerability reports (e.g., CVE reports) play an important role in the maintenance of computer and network systems. Security companies and administrators rely on information from these reports to prioritize tasks on developing and deploying patches to their customers. Since these reports are unstructured texts, automatic information extraction (IE) can help scale up the processing by converting the unstructured reports to structured forms, e.g., software names and versions and vulnerability types. Existing works on automated IE for security vulnerability reports often rely on a large number of labeled training samples. However, creating massive labeled training set is both expensive and time consuming. In this work, for the first time, we propose to investigate this problem where only a small number of labeled training samples are available. In particular, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guanqun-yang/fewvulnerability
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Topic Modeling · Software Engineering Research