Beyond Rewards in Reinforcement Learning for Cyber Defence
Elizabeth Bates, Chris Hicks, Vasilios Mavroudis

TL;DR
This paper investigates how different reward structures in reinforcement learning affect the training and effectiveness of cyber defence agents, revealing that sparse, goal-aligned rewards often outperform dense rewards in safety and policy quality.
Contribution
It provides a comprehensive evaluation of reward function impacts in cyber RL, introducing a novel ground truth evaluation method for direct comparison across reward types.
Findings
Sparse rewards improve training reliability and policy safety.
Sparse rewards lead to more goal-aligned and cost-effective policies.
Dense rewards may bias agents towards risky or suboptimal actions.
Abstract
Recent years have seen an explosion of interest in autonomous cyber defence agents trained to defend computer networks using deep reinforcement learning. These agents are typically trained in cyber gym environments using dense, highly engineered reward functions which combine many penalties and incentives for a range of (un)desirable states and costly actions. Dense rewards help alleviate the challenge of exploring complex environments but risk biasing agents towards suboptimal and potentially riskier solutions, a critical issue in complex cyber environments. We thoroughly evaluate the impact of reward function structure on learning and policy behavioural characteristics using a variety of sparse and dense reward functions, two well-established cyber gyms, a range of network sizes, and both policy gradient and value-based RL algorithms. Our evaluation is enabled by a novel ground truth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Information and Cyber Security · Software-Defined Networks and 5G
