A Streaming Analytics Language for Processing Cyber Data
Eric L. Goodman, Dirk Grunwald

TL;DR
This paper introduces SAL, a domain-specific language for semi-streaming processing of cyber data, demonstrating its effectiveness in botnet detection and scalable implementation for high-throughput network analysis.
Contribution
The paper presents SAL, a new language for semi-streaming cyber data analysis, and its interpreter SAM, enabling scalable, high-throughput network data processing.
Findings
Achieved 0.87 AUC for botnet detection
Scaled to 61 nodes with 373,000 netflows/sec
SAL simplifies cyber data analysis tasks
Abstract
We present a domain-specific language called SAL(the Streaming Analytics Language) for processing data in a semi-streaming model. In particular we examine the use case of processing netflow data in order to identify malicious actors within a network. Because of the large volume of data generated from networks, it is often only feasible to process the data with a single pass, utilizing a streaming (O(polylog n) space requirements) or semi-streaming computing model ( O(n polylog n) space requirements). Despite these constraints, we are able to achieve an average of 0.87 for the AUC of the ROC curve for a set of situations dealing with botnet detection. The implementation of an interpreter for SAL, which we call SAM (Streaming Analytics Machine), achieves scaling results that show improved throughput to 61 nodes (976 cores), with an overall rate of 373,000 netflows per second or 32.2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Graph Theory and Algorithms · Advanced Database Systems and Queries
