Towards Effective Identification of Attack Techniques in Cyber Threat   Intelligence Reports using Large Language Models

Hoang Cuong Nguyen; Shahroz Tariq; Mohan Baruwal Chhetri; Bao; Quoc Vo

arXiv:2505.03147·cs.CR·May 7, 2025

Towards Effective Identification of Attack Techniques in Cyber Threat Intelligence Reports using Large Language Models

Hoang Cuong Nguyen, Shahroz Tariq, Mohan Baruwal Chhetri, Bao, Quoc Vo

PDF

1 Repo

TL;DR

This paper evaluates and improves methods for extracting attack techniques from cyber threat reports using large language models and a novel two-step pipeline, achieving high accuracy and advancing cyber threat intelligence automation.

Contribution

It introduces a new two-step approach combining LLM summarization and a retrained SciBERT model to enhance attack technique extraction from threat reports.

Findings

01

Significant challenges identified in class imbalance and domain complexity.

02

The proposed pipeline improves F1-scores, with some techniques exceeding 0.90.

03

Enhanced efficiency of web-based CTI systems demonstrated.

Abstract

This work evaluates the performance of Cyber Threat Intelligence (CTI) extraction methods in identifying attack techniques from threat reports available on the web using the MITRE ATT&CK framework. We analyse four configurations utilising state-of-the-art tools, including the Threat Report ATT&CK Mapper (TRAM) and open-source Large Language Models (LLMs) such as Llama2. Our findings reveal significant challenges, including class imbalance, overfitting, and domain-specific complexity, which impede accurate technique extraction. To mitigate these issues, we propose a novel two-step pipeline: first, an LLM summarises the reports, and second, a retrained SciBERT model processes a rebalanced dataset augmented with LLM-generated data. This approach achieves an improvement in F1-scores compared to baseline models, with several attack techniques surpassing an F1-score of 0.90. Our contributions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hoangcuongnguyen2001/scibert-for-technique-classification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.