Statistical Models for the Number of Successful Cyber Intrusions
Nandi O. Leslie, Richard E. Harang, Lawrence P. Knachel, and Alexander, Kott

TL;DR
This paper develops and evaluates generalized linear models, especially negative binomial models, to predict successful cyber intrusions based on organizational characteristics and DNS traffic, identifying key predictors and model fit.
Contribution
It introduces a set of GLMs, particularly NB models, for intrusion count prediction and demonstrates their effectiveness over simpler models using real data.
Findings
Negative binomial GLM best fits intrusion data
Certain TLDs significantly impact intrusion counts
Network security violations are strong predictors
Abstract
We propose several generalized linear models (GLMs) to predict the number of successful cyber intrusions (or "intrusions") into an organization's computer network, where the rate at which intrusions occur is a function of the following observable characteristics of the organization: (i) domain name server (DNS) traffic classified by their top-level domains (TLDs); (ii) the number of network security policy violations; and (iii) a set of predictors that we collectively call "cyber footprint" that is comprised of the number of hosts on the organization's network, the organization's similarity to educational institution behavior (SEIB), and its number of records on scholar.google.com (ROSG). In addition, we evaluate the number of intrusions to determine whether these events follow a Poisson or negative binomial (NB) probability distribution. We reveal that the NB GLM provides the best fit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
