Event-Driven Temporal Graph Networks for Asynchronous Multi-Agent Cyber Defense in NetForge_RL
Igor Jankowski

TL;DR
This paper introduces NetForge_RL, a high-fidelity cyber operations simulator, and CT-GMARL, a continuous-time graph reinforcement learning method, to improve multi-agent cyber defense with better simulation-to-reality transfer and asynchronous processing.
Contribution
The work presents a novel simulator and a continuous-time MARL approach that effectively bridge the Sim2Real gap and handle asynchronous cyber defense data.
Findings
CT-GMARL achieves 2x higher median reward than baselines.
It restores 12x more compromised services than the strongest baseline.
Policies transfer effectively to live Docker environment with high median reward.
Abstract
The transition of Multi-Agent Reinforcement Learning (MARL) policies from simulated cyber wargames to operational Security Operations Centers (SOCs) is fundamentally bottlenecked by the Sim2Real gap. Legacy simulators abstract away network protocol physics, rely on synchronous ticks, and provide clean state vectors rather than authentic, noisy telemetry. To resolve these limitations, we introduce NetForge_RL: a high-fidelity cyber operations simulator that reformulates network defense as an asynchronous, continuous-time Partially Observable Semi-Markov Decision Process (POSMDP). NetForge enforces Zero-Trust Network Access (ZTNA) constraints and requires defenders to process NLP-encoded SIEM telemetry. Crucially, NetForge bridges the Sim2Real gap natively via a dual-mode engine, allowing high-throughput MARL training in a mock hypervisor and zero-shot evaluation against live exploits in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
