Large Scale Enrichment and Statistical Cyber Characterization of Network Traffic (Enriquecimiento a gran escala y caracterizaci\'on cibern\'etica estad\'istica del tr\'afico de red)
Ivan Kawaminami, Arminda Estrada, Youssef Elsakkary, Hayden Jananthan,, Ayd{\i}n Bulu\c{c}, Tim Davis, Daniel Grant, Michael Jones, Chad Meiners,, Andrew Morris, Sandeep Pisharody, Jeremy Kepner

TL;DR
This paper presents a scalable statistical analysis framework for large-scale enriched network traffic data, revealing key cyber characteristics and activity prevalence to aid analysts.
Contribution
It introduces the use of Python GraphBLAS and PyD4M frameworks for efficient anonymized analysis of billions of network records, confirming heavy-tail distributions and activity concentration.
Findings
Most variables follow heavy-tail distributions
A small number of cyber activities dominate traffic
Enriched data enables prioritization of cyber threats
Abstract
Modern network sensors continuously produce enormous quantities of raw data that are beyond the capacity of human analysts. Cross-correlation of network sensors increases this challenge by enriching every network event with additional metadata. These large volumes of enriched network data present opportunities to statistically characterize network traffic and quickly answer a key question: "What are the primary cyber characteristics of my network data?" The Python GraphBLAS and PyD4M analysis frameworks enable anonymized statistical analysis to be performed quickly and efficiently on very large network data sets. This approach is tested using billions of anonymized network data samples from the largest Internet observatory (CAIDA Telescope) and tens of millions of anonymized records from the largest commercially available background enrichment capability (GreyNoise). The analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Anomaly Detection Techniques and Applications · Advanced Proteomics Techniques and Applications
