Network Security Modelling with Distributional Data
Subhabrata Majumdar, Ganesh Subramaniam

TL;DR
This paper presents a machine learning approach using distributional and static NetFlow features to detect botnet C2 hosts in large IP traffic datasets, achieving high accuracy validated against malicious IP lists.
Contribution
It introduces novel distributional features based on IP-level NetFlow variable quantiles for improved botnet detection.
Findings
Distributional features enhance detection accuracy.
Models match malicious IP lists effectively.
High precision in identifying botnet C2 hosts.
Abstract
We investigate the detection of botnet command and control (C2) hosts in massive IP traffic using machine learning methods. To this end, we use NetFlow data -- the industry standard for monitoring of IP traffic -- and ML models using two sets of features: conventional NetFlow variables and distributional features based on NetFlow variables. In addition to using static summaries of NetFlow features, we use quantiles of their IP-level distributions as input features in predictive models to predict whether an IP belongs to known botnet families. These models are used to develop intrusion detection systems to predict traffic traces identified with malicious attacks. The results are validated by matching predictions to existing denylists of published malicious IP addresses and deep packet inspection. The usage of our proposed novel distributional features, combined with techniques that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Network Packet Processing and Optimization · Internet Traffic Analysis and Secure E-voting
