# D3O-IIoT: deep reinforcement learning-driven dynamic deception orchestration for industrial IoT security

**Authors:** Usman Wushishi, Altaf Hussain, Muhammad Imran Khalid, Nasir Hussain, Mona Jamjoom, Zahid Ullah

PMC · DOI: 10.1038/s41598-025-33426-4 · Scientific Reports · 2025-12-21

## TL;DR

This paper introduces D3O-IIoT, a dynamic security system using deep reinforcement learning to adaptively defend industrial IoT systems against cyber threats.

## Contribution

D3O-IIoT introduces a novel dynamic deception orchestration framework using reinforcement learning for IIoT security.

## Key findings

- D3O-IIoT achieves a 13.7% attack mitigation rate with a 0.3% false alarm rate, outperforming baselines by 293–767%.
- Cross-dataset validation shows 97.7% and 77.8% retention on TON-IoT and WUSTL-IIoT, respectively.
- The policy favors isolation (71.2%) for confirmed threats and honeypots (15.4%) for reconnaissance with 2.07ms latency.

## Abstract

The industrial Internet of Things (IIoT) systems are under mounting cyber threats that take advantage of the resource shortage and operational vulnerability of industrial systems. The current intrusion detection schemes are based on either the static or passive form of defense that is not dynamically adapted to the changing attacks. This paper presents D3O-IIoT, a progressive reinforcement learning model that dynamically coordinates deception techniques, including honeypot deployment, moving target defense, fake telemetry injection, and node isolation on the basis of real time threat monitoring. The defense problem is formulated as a Markov Decision Process, in which a Dueling Deep Q-Network agent maximizes a multi-objective reward to balance between attack mitigation, deception engagement, false positive control and resource cost. Experiments on three IIoT datasets (CIC-IIoT2025, WUSTL-IIoT2021, TON-IoT) demonstrate that D3O-IIoT has a 13.7% attack mitigation rate with a 0.3% false alarm, which is an improvement of 293–767% (p < 0.0001) over baselines. Generalization is confirmed by cross-dataset validation (97.7% and 77.8% retention on TON-IoT and WUSTL-IIoT, respectively). Results of Ablation determine that the most critical component of reward is false positive control (51.4% degradation upon removal) and that sensitivity analysis indicates the possibility of 46.1% tunability through risk threshold change. The acquired policy favors isolation (71.2 per cent) on confirmed threats and honeypots (15.4 per cent) on reconnaissance with a 2.07ms latency that can be deployed in real time. D3O-IIoT builds upon IIoT cybersecurity by substituting fixed set rule-based defenses with dynamic and learning-based deception orchestration, balancing various practical goals under resource-constrained conditions.

## Full-text entities

- **Diseases:** IIoT (MESH:D009783), AMR (MESH:C536766), IDS (MESH:C537310), aggressiveness (MESH:D010554), TCP (MESH:C564276)
- **Chemicals:** D3O (-), IP (MESH:C041508)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12816736/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12816736/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/PMC12816736/full.md

---
Source: https://tomesphere.com/paper/PMC12816736