# A Framework for Budget-Constrained Zero-Day Cyber Threat Mitigation: A Knowledge-Guided Reinforcement Learning Approach

**Authors:** Mainak Basak, Geon-Yun Shin

PMC · DOI: 10.3390/s26010021 · Sensors (Basel, Switzerland) · 2025-12-19

## TL;DR

This paper introduces a new framework using reinforcement learning and cybersecurity knowledge to efficiently detect and explain zero-day cyber threats within budget limits.

## Contribution

A novel framework integrating ATT&CK-based model generation, budget-constrained reinforcement learning, and causal explanations for zero-day threat mitigation.

## Key findings

- The framework achieves low false positive rate accuracy in detecting zero-day threats.
- It improves time-to-detect (TTD) and model calibration compared to conventional methods.
- The system provides traceable explanations for alarms through a Cyber-Threat Knowledge Graph.

## Abstract

Conventional machine-learning-based defenses are unable to generalize well to novel chains of ATT&CK actions. Being inefficient with low telemetry budgets, they are also unable to provide causal explainability and auditing. We propose a knowledge-based cyber-defense framework that integrates ATT&CK constrained model generation, budget-constrained reinforcement learning, and graph-based causal explanation into a single auditable pipeline. The framework formalizes the synthesis of zero-day chains of attacks using a grammar-formalized ATT&CK database and compiles them into the Zeek-aligned witness telemetry. This allows for efficient training of detection using the generated data within limited sensor budgets. The Cyber-Threat Knowledge Graph (CTKG) stores dynamically updated inter-relational semantics between tactics, techniques, hosts, and vulnerabilities. This enhances the decision state using causal relations. The sensor budget policy selects the sensoring and containment decisions within explicit bounds of costs and latency. The inherent defense-provenance features enable a traceable explanation of each generated alarm. Extensive evaluations of the framework using the TTP holdouts of the zero-day instances show remarkable improvements over conventional techniques in terms of low-FPR accuracy, TTD, and calibration.

## Full-text entities

- **Genes:** SIM2 (SIM bHLH transcription factor 2) [NCBI Gene 6493] {aka HMC13F06, HMC29C01, SIM, bHLHe15}, NBN (nibrin) [NCBI Gene 4683] {aka AT-V1, AT-V2, ATV, NBS, NBS1, P95}
- **Diseases:** injury to (MESH:D014947), Heads and Loss (MESH:D006258), CP (MESH:D000079263), leak (MESH:D019559), TTD (MESH:D000377)
- **Chemicals:** CTKG (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12787384/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12787384/full.md

## References

73 references — full list in the complete paper: https://tomesphere.com/paper/PMC12787384/full.md

---
Source: https://tomesphere.com/paper/PMC12787384