Applying Action Masking and Curriculum Learning Techniques to Improve Data Efficiency and Overall Performance in Operational Technology Cyber Security using Reinforcement Learning
Alec Wilson, William Holmes, Ryan Menzies, Kez Smithson Whitehead

TL;DR
This paper demonstrates that combining curriculum learning and action masking significantly improves data efficiency and performance of reinforcement learning agents in operational technology cyber security scenarios, with faster training and higher rewards.
Contribution
It extends the IPMSRL environment to include more realistic dynamics and shows that curriculum learning and action masking enhance RL performance and data efficiency in cyber security tasks.
Findings
Curriculum learning increased episode rewards from -2.791 to -0.569.
Action masking increased episode rewards from -2.791 to -0.743.
Combined methods achieved a reward of 0.137 in less than 1 million steps.
Abstract
In previous work, the IPMSRL environment (Integrated Platform Management System Reinforcement Learning environment) was developed with the aim of training defensive RL agents in a simulator representing a subset of an IPMS on a maritime vessel under a cyber-attack. This paper extends the use of IPMSRL to enhance realism including the additional dynamics of false positive alerts and alert delay. Applying curriculum learning, in the most difficult environment tested, resulted in an episode reward mean increasing from a baseline result of -2.791 to -0.569. Applying action masking, in the most difficult environment tested, resulted in an episode reward mean increasing from a baseline result of -2.791 to -0.743. Importantly, this level of performance was reached in less than 1 million timesteps, which was far more data efficient than vanilla PPO which reached a lower level of performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security
MethodsEntropy Regularization · Proximal Policy Optimization
