CANAL -- Cyber Activity News Alerting Language Model: Empirical Approach vs. Expensive LLM
Urjitkumar Patel, Fang-Chun Yeh, Chinmay Gondhalekar

TL;DR
This paper introduces CANAL, a fine-tuned BERT-based model for cyber threat detection from news, outperforming larger LLMs in accuracy and cost, with a novel signal discovery module for emerging threats.
Contribution
The paper presents a cost-effective, empirically trained cyber threat classification model using a novel silver labeling approach and introduces a new module for detecting emerging cyber signals.
Findings
CANAL outperforms GPT-4, LLaMA, Zephyr in accuracy and cost.
The silver labeling approach effectively trains the model with less expensive data.
Cyber Signal Discovery module enhances detection of emerging threats.
Abstract
In today's digital landscape, where cyber attacks have become the norm, the detection of cyber attacks and threats is critically imperative across diverse domains. Our research presents a new empirical framework for cyber threat modeling, adept at parsing and categorizing cyber-related information from news articles, enhancing real-time vigilance for market stakeholders. At the core of this framework is a fine-tuned BERT model, which we call CANAL - Cyber Activity News Alerting Language Model, tailored for cyber categorization using a novel silver labeling approach powered by Random Forest. We benchmark CANAL against larger, costlier LLMs, including GPT-4, LLaMA, and Zephyr, highlighting their zero to few-shot learning in cyber news classification. CANAL demonstrates superior performance by outperforming all other LLM counterparts in both accuracy and cost-effectiveness. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Attention Dropout · Position-Wise Feed-Forward Layer · Weight Decay · Dropout · Label Smoothing
