# BERT-spaCy hybrid NLP and blockchain-enhanced adaptive CTI for IOC extraction and threat prediction

**Authors:** Shailendra Mishra, Ruba Ahmed Alfahidah, Fayez Alharbi

PMC · DOI: 10.1038/s41598-025-34505-2 · 2026-03-02

## TL;DR

A new cybersecurity system uses BERT and blockchain to detect and predict threats with high accuracy and speed.

## Contribution

A hybrid CTI system combining BERT, blockchain, and adaptive ML for IOC extraction and threat prediction with high accuracy and latency reduction.

## Key findings

- BERT-spaCy model achieved 95% accuracy and 95.7% F1-score for IOC extraction with 55% latency reduction.
- System validated with strong statistical significance (p < 0.001) across CIC-IDS2017 and UNSW-NB15 datasets.
- BERT outperformed LSTM, SVM, and Naïve Bayes in cross-dataset robustness with a CRI of 0.999.

## Abstract

Cyber-attacks pose a significant risk to digital infrastructure, resulting in losses at both individual and organizational levels, underscoring the need for proactive and intelligent defense mechanisms. This study proposes a hybrid Cyber Threat Intelligence (CTI) system integrating an immutable blockchain ledger, adaptive machine-learning models, and natural-language processing algorithms for timely detection, classification, and secure sharing of threat data. The system forecasts future attacks by analyzing aggregated data and recommending mitigation strategies. A BERT-based model, combined with spaCy and regular expressions for extracting Indicators of Compromise (IOCs) from unstructured data, achieved 95% accuracy and a 95.7% F1-score, with a 55% latency reduction (from 120ms to 54ms for 200 reports). Validation used 10-fold cross-validation with paired t-tests across 10,000 Monte Carlo simulations (t = 3.45, p < 0.001, Cohen’s d ranging 0.76–1.12 from heatmaps) on CIC-IDS2017 and UNSW-NB15 datasets. The Cross-Dataset Robustness Index (CRI) confirmed strong generalization, with BERT at 0.999, slightly outperforming LSTM (0.998), SVM (0.95), and Naïve Bayes (0.92). The system excels in high-volume data processing, event correlation, and threat detection/response rates. This scalable solution suits Security Operations Centers (SOCs), IoT environments, and financial cybersecurity, providing robust unstructured data handling and adaptability to evolving threats.

## Full-text entities

- **Diseases:** ATT&amp;CK (OMIM:300831), CTI (MESH:C538142), TTP (MESH:D011697)
- **Chemicals:** IOC (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12960921/full.md

---
Source: https://tomesphere.com/paper/PMC12960921