TL;DR
This paper evaluates and improves methods for extracting attack techniques from cyber threat reports using large language models and a novel two-step pipeline, achieving high accuracy and advancing cyber threat intelligence automation.
Contribution
It introduces a new two-step approach combining LLM summarization and a retrained SciBERT model to enhance attack technique extraction from threat reports.
Findings
Significant challenges identified in class imbalance and domain complexity.
The proposed pipeline improves F1-scores, with some techniques exceeding 0.90.
Enhanced efficiency of web-based CTI systems demonstrated.
Abstract
This work evaluates the performance of Cyber Threat Intelligence (CTI) extraction methods in identifying attack techniques from threat reports available on the web using the MITRE ATT&CK framework. We analyse four configurations utilising state-of-the-art tools, including the Threat Report ATT&CK Mapper (TRAM) and open-source Large Language Models (LLMs) such as Llama2. Our findings reveal significant challenges, including class imbalance, overfitting, and domain-specific complexity, which impede accurate technique extraction. To mitigate these issues, we propose a novel two-step pipeline: first, an LLM summarises the reports, and second, a retrained SciBERT model processes a rebalanced dataset augmented with LLM-generated data. This approach achieves an improvement in F1-scores compared to baseline models, with several attack techniques surpassing an F1-score of 0.90. Our contributions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
