From Text to Actionable Intelligence: Automating STIX Entity and Relationship Extraction
Ahmed Lekssays, Husrev Taha Sencar, Ting Yu

TL;DR
This paper presents AZERG, a tool that leverages fine-tuned large language models to automatically extract and structure threat intelligence data in STIX format from unstructured security reports, significantly improving automation and accuracy.
Contribution
The paper introduces AZERG, a novel system that automates STIX data extraction using task-specific fine-tuning of language models, supported by a large annotated dataset.
Findings
Achieved high F1-scores across extraction tasks, e.g., 95.47% for related pair detection.
Demonstrated 2-25% performance improvements over existing methods.
Validated effectiveness on real-world threat analysis reports.
Abstract
Sharing methods of attack and their effectiveness is a cornerstone of building robust defensive systems. Threat analysis reports, produced by various individuals and organizations, play a critical role in supporting security operations and combating emerging threats. To enhance the timeliness and automation of threat intelligence sharing, several standards have been established, with the Structured Threat Information Expression (STIX) framework emerging as one of the most widely adopted. However, generating STIX-compatible data from unstructured security text remains a largely manual, expert-driven process. To address this challenge, we introduce AZERG, a tool designed to assist security analysts in automatically generating structured STIX representations. To achieve this, we adapt general-purpose large language models for the specific task of extracting STIX-formatted threat data. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Biomedical Text Mining and Ontologies · Semantic Web and Ontologies
