Flurry: a Fast Framework for Reproducible Multi-layered Provenance Graph Representation Learning
Maya Kapoor, Joshua Melton, Michael Ridenhour, Mahalavanya Sriram,, Thomas Moyer, Siddharth Krishnan

TL;DR
Flurry is a comprehensive framework that efficiently generates, processes, and utilizes provenance graphs from cyberattack simulations for machine learning-based security analysis, enhancing reproducibility and data availability.
Contribution
It introduces an end-to-end pipeline for transforming system attack data into provenance graphs and supports deep neural network training for cybersecurity applications.
Findings
Successfully processed multiple attack datasets
Enabled anomaly detection through graph classification
Provided a fast, extensible tool for cybersecurity provenance data
Abstract
Complex heterogeneous dynamic networks like knowledge graphs are powerful constructs that can be used in modeling data provenance from computer systems. From a security perspective, these attributed graphs enable causality analysis and tracing for analyzing a myriad of cyberattacks. However, there is a paucity in systematic development of pipelines that transform system executions and provenance into usable graph representations for machine learning tasks. This lack of instrumentation severely inhibits scientific advancement in provenance graph machine learning by hindering reproducibility and limiting the availability of data that are critical for techniques like graph neural networks. To fulfill this need, we present Flurry, an end-to-end data pipeline which simulates cyberattacks, captures provenance data from these attacks at multiple system and application layers, converts audit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Advanced Graph Neural Networks
