# Adaptive Honeypot Engagement through Reinforcement Learning of   Semi-Markov Decision Processes

**Authors:** Linan Huang, Quanyan Zhu

arXiv: 1906.12182 · 2019-11-12

## TL;DR

This paper introduces an adaptive honeypot engagement strategy using reinforcement learning on Semi-Markov Decision Processes to optimize attacker interaction, balancing information gain and risk in cyber defense.

## Contribution

It applies SMDP modeling and reinforcement learning to develop risk-averse, cost-effective honeypot engagement policies for improved cyber threat intelligence gathering.

## Key findings

- Adaptive policies attract attackers quickly and sustain engagement.
- Low penetration probability maintains attacker risk at a minimum.
- Reinforcement learning achieves fast convergence to optimal policies.

## Abstract

A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive engagement policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to the SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.12182/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/1906.12182/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/1906.12182/full.md

---
Source: https://tomesphere.com/paper/1906.12182