Multi-Agent LLM Governance for Safe Two-Timescale Reinforcement Learning in SDN-IoT Defense
Saeid Jamshidi, Negar Shahabi, Foutse Khomh, Carol Fung, and Mohammad Hamdaqa

TL;DR
This paper presents a multi-agent LLM governance framework for safe, auditable, and adaptive reinforcement learning-based SDN-IoT defense, improving stability and performance under attack.
Contribution
It introduces a two-timescale defense approach combining fast mitigation with slow, LLM-driven policy governance for SDN-IoT security.
Findings
9.1% improvement in Macro-F1 over PPO
15.4% improvement over static baselines
36.8% reduction in worst-case degradation
Abstract
Software-Defined Networking (SDN) is increasingly adopted to secure Internet-of-Things (IoT) networks due to its centralized control and programmable forwarding. However, SDN-IoT defense is inherently a closed-loop control problem in which mitigation actions impact controller workload, queue dynamics, rule-installation delay, and future traffic observations. Aggressive mitigation may destabilize the control plane, degrade Quality of Service (QoS), and amplify systemic risk. Existing learning-based approaches prioritize detection accuracy while neglecting controller coupling and short-horizon Reinforcement Learning (RL) optimization without structured, auditable policy evolution. This paper introduces a self-reflective two-timescale SDN-IoT defense solution separating fast mitigation from slow policy governance. At the fast timescale, per-switch Proximal Policy Optimization (PPO) agents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
