TL;DR
This paper explores various statistical and machine learning models to forecast sparse, bursty vulnerability sightings, highlighting the challenges and proposing practical solutions for cyber threat intelligence.
Contribution
It evaluates the effectiveness of SARIMAX, count-based models, and simple decay functions for short-term forecasting of vulnerability sightings under data constraints.
Findings
SARIMAX models are limited for sparse, bursty data.
Count-based models like Poisson regression offer more stable forecasts.
Exponential decay functions provide practical short-term estimates.
Abstract
Understanding and anticipating vulnerability-related activity is a major challenge in cyber threat intelligence. This work investigates whether vulnerability sightings, such as proof-of-concept releases, detection templates, or online discussions, can be forecast over time. Building on our earlier work on VLAI, a transformer-based model that predicts vulnerability severity from textual descriptions, we examine whether severity scores can improve time-series forecasting as exogenous variables. We evaluate several approaches for short-term forecasting of sightings per vulnerability. First, we test SARIMAX models with and without log(x+1) transformations and VLAI-derived severity inputs. Although these adjustments provide limited improvements, SARIMAX remains poorly suited to sparse, short, and bursty vulnerability data. In practice, forecasts often produce overly wide confidence intervals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
