TL;DR
HackerSignal is a comprehensive, large-scale dataset linking hacker discourse to CVE vulnerabilities, supporting AI-driven cybersecurity research with multiple benchmark tasks and temporal out-of-distribution evaluation.
Contribution
It introduces HackerSignal, a novel dataset that maps hacker community discussions to CVE vulnerabilities, enabling new AI cybersecurity analytics and benchmarking.
Findings
Aggregates 7.45 million documents over 36 years from 64 sources.
Supports three benchmark tasks: CVE linkage, exploit classification, temporal generalization.
Provides tools and diagnostics for dataset transparency and reuse.
Abstract
We introduce HackerSignal, a benchmark for temporal out-of-distribution cyber threat intelligence (CTI) and cross-source CVE linkage. HackerSignal aggregates 7.45 million exact-deduplicated documents from 64 public forum/source identifiers spanning eight source layers and a 36-year window (1990-2026). In contrast to other publicly accessible cybersecurity datasets, HackerSignal is among the first public benchmark datasets that maps the full potential exploit to vulnerability trajectory from hacker community discourse, exploit databases with working and proof of concept exploits, vulnerability advisories, and software fix commits. HackerSignal creates these linkages through a shared CVE identifier space while preserving source-specific release modes to support a range of unique Artificial Intelligence (AI)-enabled cybersecurity analytics tasks. In this paper, we summarize HackerSignal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
