A Scalable Approach to Solving Simulation-Based Network Security Games
Michael Lanier, Yevgeniy Vorobeychik

TL;DR
MetaDOAR is a scalable reinforcement learning method that improves decision-making in large cyber-networks by efficiently selecting critical nodes and caching evaluations, outperforming state-of-the-art baselines.
Contribution
Introduces MetaDOAR, a novel meta-controller with partition-aware filtering and caching, enabling scalable multi-agent reinforcement learning in large cyber-network environments.
Findings
MetaDOAR achieves higher payoffs than SOTA baselines.
It scales well without significant memory or training time issues.
Provides a practical approach for hierarchical policy learning in large networks.
Abstract
We introduce MetaDOAR, a lightweight meta-controller that augments the Double Oracle / PSRO paradigm with a learned, partition-aware filtering layer and Q-value caching to enable scalable multi-agent reinforcement learning on very large cyber-network environments. MetaDOAR learns a compact state projection from per node structural embeddings to rapidly score and select a small subset of devices (a top-k partition) on which a conventional low-level actor performs focused beam search utilizing a critic agent. Selected candidate actions are evaluated with batched critic forwards and stored in an LRU cache keyed by a quantized state projection and local action identifiers, dramatically reducing redundant critic computation while preserving decision quality via conservative k-hop cache invalidation. Empirically, MetaDOAR attains higher player payoffs than SOTA baselines on large network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Reinforcement Learning in Robotics · Access Control and Trust
