PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts
Nikki McNeil, Robert A. Bridges, Michael D. Iannacone, Bogdan Czejdo,, Nicolas Perez, John R. Goodall

TL;DR
PACE is a semi-supervised learning algorithm that improves entity extraction in cybersecurity texts by balancing efficiency and accuracy, enabling timely discovery of security-related information from online sources.
Contribution
It introduces a novel bootstrapping enhancement using a time-memory trade-off to improve pattern nomination and extraction accuracy in cybersecurity text analysis.
Findings
Enhanced pattern nomination accuracy
Reduced corpus search costs
Effective in cybersecurity domain
Abstract
Public disclosure of important security information, such as knowledge of vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and other online sources months before proper classification into structured databases. In order to facilitate timely discovery of such knowledge, we propose a novel semi-supervised learning algorithm, PACE, for identifying and classifying relevant entities in text sources. The main contribution of this paper is an enhancement of the traditional bootstrapping method for entity extraction by employing a time-memory trade-off that simultaneously circumvents a costly corpus search while strengthening pattern nomination, which should increase accuracy. An implementation in the cyber-security domain is discussed as well as challenges to Natural Language Processing imposed by the security domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Data Quality and Management · Advanced Text Analysis Techniques
