TL;DR
This paper introduces a reinforcement learning method, T-SPSA, to learn optimal threshold policies for automated intrusion prevention, demonstrating its effectiveness in practical IT infrastructure scenarios.
Contribution
It formulates intrusion prevention as a multiple stopping problem and develops T-SPSA, an efficient RL algorithm for learning threshold policies, outperforming existing methods.
Findings
T-SPSA outperforms state-of-the-art algorithms in intrusion prevention tasks.
The approach effectively learns policies in simulation and emulation environments.
Threshold policies derived are practical for real-world IT infrastructure security.
Abstract
We study automated intrusion prevention using reinforcement learning. Following a novel approach, we formulate the problem of intrusion prevention as an (optimal) multiple stopping problem. This formulation gives us insight into the structure of optimal policies, which we show to have threshold properties. For most practical cases, it is not feasible to obtain an optimal defender policy using dynamic programming. We therefore develop a reinforcement learning approach to approximate an optimal threshold policy. We introduce T-SPSA, an efficient reinforcement learning algorithm that learns threshold policies through stochastic approximation. We show that T-SPSA outperforms state-of-the-art algorithms for our use case. Our overall method for learning and validating policies includes two systems: a simulation system where defender policies are incrementally learned and an emulation system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
